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In  the  modern  world,  not  only  is  software  getting  larger  and  more  complex,  it  is  also 
becoming  pervasive  in  our  daily  lives.  On  the  one  hand,  the  advent  of  multi-core  processors  is 
pushing  software  towards  becoming  more  concurrent,  making  it  more  complex.  On  the  other 
hand,  software  is  everywhere,  inside  nuclear  reactors,  space  shuttles,  cars,  traffic  signals,  cell 
phones,  etc.  To  meet  this  demand  for  software,  we  need  to  invest  in  automated  program- 
verification  techniques,  which  ensure  that  software  will  always  behave  as  intended. 

The  problem  of  program  verification  is  undecidable.  A  verification  technique  can  only 
gain  a  limited  amount  of  knowledge  about  a  program’s  behavior  by  reasoning  about  certain 
aspects  of  the  program.  This  dissertation  addresses  program  verification  by  considering  two 
important  features  of  programs:  (i)  procedures  (and  procedure  calls)  and  ( ii )  concurrency. 

Interprocedural  Analysis:  An  analysis  that  can  precisely  handle  the  procedural  aspect 
of  programs  is  called  an  interprocedural  analysis.  Procedures  are  an  important  feature  of 
most  programming  languages  because  they  allow  for  modular  design  of  programs:  each 
procedure  is  meant  to  perform  a  task,  and  they  can  be  put  together  to  implement  more 
complex  functionality.  Because  procedures  serve  as  a  natural  abstraction  mechanism  for 
developers  to  organize  their  programs,  an  interprocedural  analysis  can  leverage  them  to 
enable  verification  of  a  larger  and  more  complex  programs. 

There  is  a  long  history  of  work  on  interprocedural  analysis,  including  several  frameworks 
that  support  a  variety  of  different  program  abstractions,  and  provide  algorithms  for  analyzing 
them.  The  advantage  of  having  a  framework  is  that  any  program  abstraction  that  fits  the 
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framework  can  make  use  of  the  algorithms  for  the  framework.  One  such  framework,  called 
Weighted  Pushdown  Systems  (WPDSs),  was  the  subject  of  the  research  reported  on  in  this 
dissertation. 

The  dissertation  makes  several  contributions  to  interprocedural  analyses  that  are  based 
on  WPDSs: 

•  We  define  the  Extended  WPDS  (EWPDS)  model,  which  removes  a  crucial  limitation 
of  WPDSs  by  providing  a  convenient  abstraction  mechanism  for  local  variables  of  a 
procedure.  Using  EWPDSs,  it  is  possible  to  model  a  program’s  behavior  more  precisely 
than  with  WPDSs.  In  our  work,  we  used  EWPDSs  for  checking  properties  of  Boolean 
programs;  computing  affine  relations  in  x86  programs;  building  debugging  tools;  com¬ 
puting  alias  pairs  in  programs  with  single-level  pointers;  and  for  checking  properties  of 
concurrent  programs  (where  EWPDSs  are  used  to  model  individual  threads). 

•  We  use  graph-theoretic  algorithms  to  speed  up  the  analysis  algorithms  for  WPDSs  and 
EWPDSs.  This  results  in  immediate  speedup  in  all  of  the  applications  based  on  these 
models  without  requiring  any  tuning  for  a  particular  application.  The  speedups  ranged 
from  1.8  x  to  3.6  x. 

•  We  show  how  to  answer  more  expressive  queries  on  EWPDSs,  such  as  computing  the 
set  of  all  error  traces  in  the  model,  called  an  error  projection.  This  enables  faster 
verification. 

Concurrency:  The  advent  of  multi-core  processors  is  pushing  software  to  become  more 
concurrent.  Concurrent  programs  are  not  only  difficult  to  write,  but  are  also  difficult  to 
analyze  and  verify.  One  reason  is  that  the  interprocedural  analysis  of  concurrent  programs 
is  undecidable,  even  when  all  of  the  other  aspects  of  a  programs  (like  the  program  heap, 
non-scalar  variables,  pointers,  etc.)  are  abstracted  away.  As  a  result,  most  verification  tools 
do  not  mix  interprocedural  analysis  with  concurrency,  i.e.,  tools  that  analyze  concurrent 
programs  give  up  on  precise  handling  of  procedures.  This  is  unfortunate  because  precise 
handling  of  procedures  has  proven  to  be  very  useful  for  the  analysis  of  sequential  programs. 
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The  contribution  of  our  work  is  to  give  techniques  for  interprocedural  analysis  for  con¬ 
current  programs.  We  show  that  one  does  not  have  to  design  new  algorithms  for  concurrent 
programs;  instead,  it  is  possible  to  automatically  extend  most  interprocedural  analysis  tech¬ 
niques  for  sequential  programs  to  perform  interprocedural  analysis  of  concurrent  programs. 

As  mentioned  earlier,  interprocedural  analysis  of  concurrent  programs  is  undecidable. 
We  sidestep  the  undecidability  by  placing  a  bound  on  the  number  of  context  switches ,  i.e., 
we  bound  the  number  of  times  control  is  transferred  from  one  thread  to  another.  We  call  the 
analysis  of  concurrent  programs  under  a  bound  on  the  number  of  context  switches  context- 
bounded,  analysis  (CBA). 

CBA  is  an  interesting  avenue  of  research  that  has  attracted  a  lot  of  attention  recently 
because  empirical  evidence  suggests  that  many  concurrency-related  bugs  can  be  found  in  a 
few  context  switches.  Moreover,  CBA  was  shown  to  be  decidable  for  finite-data  abstractions. 

The  dissertation  makes  two  important  contributions  to  interprocedural  analysis  of  con¬ 
current  programs: 

•  We  show  that  if  each  thread  is  modeled  using  a  WPDS  then  CBA  is  decidable,  and  also 
give  an  algorithm  for  performing  CBA.  This  represents  the  first  step  towards  providing 
a  general  model  for  concurrent  programs  that  can  be  used  to  perform  interprocedural 
analysis. 

•  We  show  that,  given  a  concurrent  program  P  and  a  context  bound  K,  one  can  create 
a  sequential  program  Pk  such  that  the  analysis  of  Pk  is  sufficient  for  CBA  of  P  under 
the  bound  K .  This  reduction  is  a  source-to-source  transformation,  and  requires  no 
assumptions  nor  extra  work  on  the  part  of  the  user,  except  for  the  identification  of 
thread-local  data.  We  implemented  this  technique  to  create  the  first  known  implemen¬ 
tation  of  CBA.  Using  this  tool,  we  conducted  a  study  on  concurrent  Linux  drivers  to 
show  that  most  bugs  could  not  only  be  found  in  a  few  context  switches,  but,  compared 
to  previous  approaches,  they  could  be  found  must  faster  using  our  approach. 
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Chapter  1 
Introduction 

In  the  modern  world,  not  only  is  software  getting  larger  and  more  complex,  it  is  also 
becoming  pervasive  in  our  daily  lives.  On  the  one  hand,  the  advent  of  multi-core  processors  is 
pushing  software  towards  becoming  more  concurrent,  making  it  more  complex.  On  the  other 
hand,  software  is  everywhere,  inside  nuclear  reactors,  space  shuttles,  cars,  traffic  signals,  cell 
phones,  etc.  In  meeting  this  demand  for  software,  ensuring  reliability  is  one  of  the  major 
bottlenecks.  Any  approach  for  ensuring  reliability  that  requires  a  substantial  manual  effort 
is  not  going  to  suffice  in  the  future,  and  we  need  to  invest  in  automated  program  verification 
techniques. 

The  goal  of  program  verification  is  to  inspect  program  behavior,  and  then  conclude  if 
some  program  execution  can  be  faulty,  or  if  there  are  no  faulty  executions.  One  key  ingredient 
needed  for  program  verification  is  a  property  that  classifies  if  a  program  execution  is  faculty 
or  not.  The  property  of  interest  can,  for  example,  state  that  there  are  no  null-pointer 
dereferences  or  memory-safety  violations,  or  be  more  functional  and  state  that  the  result  of 
executing  a  procedure  on  an  array  is  that  it  sorts  the  array.  Thus,  in  program  verification, 
given  a  program  P  and  a  property  A,  one  has  to  answer  the  question:  “is  there  an  execution 
of  P  that  violates  property  A?” .  The  answer  can  be  in  the  form  of  the  violating  execution, 
or  a  proof  that  the  property  holds  for  all  executions  of  P. 

Program  verification  is  undecidable,  i.e. ,  no  single  tool  can  always  give  an  answer  to  the 
above  question  for  given  any  program  and  any  property.  Consequently,  program- verification 
research  focuses  on  developing  algorithms  and  tools  that  can  only  infer  a  few  aspects  of 
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Figure  1.1  Typical  design  of  verification  tools  based  on  abstraction. 


a  program’s  behavior.  Such  tools  find  answers  for  as  many  properties  as  possible  under 
their  limited  knowledge  of  the  program’s  behavior.  Most  tools  are  based  on  the  notion  of 
abstraction. 

1.1  The  Need  for  Abstraction 

A  common  organization  of  a  verification  tool  is  shown  in  Fig.  1.1.  It  has  two  main  phases. 
The  first  is  an  abstraction  phase  that  creates  an  abstract  model  of  a  given  program.  The 
set  of  behaviors  of  this  model  is  a  superset  of  the  set  of  behaviors  of  the  original  program. 
In  other  words,  the  abstract  model  over-approximates  the  original  program.  An  example  is 
discussed  in  Section  1.1.2. 

The  second  phase  is  the  analysis  phase,  which  checks  if  the  abstract  model  can  violate  the 
property  of  interest.  This  check,  in  essence,  is  to  see  if  the  set  of  behaviors  of  the  abstract 
model  is  disjoint  from  the  set  of  bad  behaviors  described  by  the  property,  as  depicted  in 
Fig.  1.2.  If  so,  one  can  conclude  that  the  original  program  has  no  bad  behaviors.  Otherwise, 
some  of  the  behaviors  in  their  intersection  are  reported  (which  may  or  may  not  be  actual 
behaviors  of  the  original  programs  —  see  Fig.  1.4). 

The  reason  for  the  separation  of  the  two  phases  is  that  the  set  of  behaviors  of  a  program, 
in  general,  is  not  computable,  but  is  computable  for  the  abstract  model.1  In  the  model 
checking  community,  the  first  phase  is  called  model  extraction  and  the  second  phase  is  called 
model  checking. 

1In  some  cases,  it  may  not  be  computable  even  for  the  abstract  model,  in  which  case  the  analysis  phase 
further  over-approximates  the  set  of  behaviors  of  the  abstract  model. 
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Figure  1.2  Proving  the  absence  of  bugs  using  abstraction. 


Note  that  what  we  have  described  here  is  the  common  approach  taken  for  program 
verification  in  which  the  set  of  program  behaviors  is  over-approximated.  The  approach  taken 
by  some  bug-finding  tools  is  to  under-approximate  the  set  of  program  behaviors.  Then 
any  intersection  between  this  set  and  the  set  of  bad  behaviors  described  by  the  property 
immediately  indicates  the  presence  of  a  bug.  In  this  dissertation,  we  will  mostly  stick  to  the 
verification  approach. 

1.1.1  Abstraction  Refinement 

It  is  possible  that  the  chosen  abstraction  is  too  coarse  to  prove  that  a  property  holds, 
i.e.,  the  set  of  bad  behaviors  is  not  disjoint  from  the  set  of  behaviors  of  the  abstract  model, 
but  is  disjoint  from  the  set  of  behaviors  of  the  program.  In  this  case,  abstraction  refinement 
can  be  used:  multiple  abstractions  with  increasing  precision  are  used  until  a  bug  is  found  or 
the  property  is  proved  to  hold.  In  Fig.  1.4,  abstractions  Ai  and  A2  are  not  precise  enough 
to  prove  that  the  property  holds,  but  abstraction  A3  suffices. 

The  design  of  a  verification  tool  based  on  abstraction  refinement  is  shown  in  Fig.  1.3.  The 
result  of  the  analysis  phase  is  used  to  refine  the  abstraction  when  necessary:  if  the  current 
abstract  model  does  not  violate  the  property,  then  the  analysis  phase  concludes  that  the 
program  is  correct;  otherwise,  it  produces  a  behavior  of  the  model,  called  a  counterexample , 
that  violates  the  property.  If  this  is  also  a  behavior  of  the  original  program,  then  a  bug  has 
been  found.  Otherwise,  the  abstraction  is  refined  to  produce  a  model  that  does  not  exhibit 
this  behavior,  and  the  process  continues.  Because  verification  is  undecidable,  it  is  possible 
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Figure  1.4  Proving  the  absence  of  bad  behaviors  using  abstraction  refinement. 


that  the  refinement  loop  may  fail  to  terminate;  i.e.,  the  abstraction  is  made  more  and  more 
precise,  but  it  always  fails  to  show  the  absence  of  a  bug,  or  to  produce  an  actual  bug. 

1.1.2  Example:  Predicate  Abstraction  and  Boolean  Programs 

In  this  section,  we  describe  how  predicate  abstraction  is  used  to  create  abstract  models  of 
programs,  called  Boolean  programs.  We  also  illustrate  how  abstraction,  as  well  as  abstraction 
refinement,  helps  in  verification.  The  examples  used  in  this  section  are  taken  from  [7],  and 
will  be  used  later  in  Chapter  5  to  illustrate  some  of  the  contributions  of  this  dissertation. 

Consider  the  program  P  shown  in  the  leftmost  column  of  Fig.  1.5.  We  would  like  to 
verify  that  the  assertion  shown  on  line  10  can  never  fail.  Because  the  assertion  will  always 
fail  when  executed,  we  essentially  have  to  show  that  line  10  is  never  reached  in  any  program 
execution. 
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numUnits  :  int; 

- 

nUO:  bool; 

nUO:  bool; 

level  :  int; 

void  getUnit()  { 

void  getUnitf)  { 

void  getUnit()  { 

void  getUnit()  { 

[1] 

canEnter:  bool  :=  F; 

[1] 

[1] 

[1] 

cE:  bool  :=  F; 

[2] 

if  (numUnits  =  0)  { 

[2] 

if  (?)  { 

[2] 

if  (nUO)  { 

[2] 

if  (nUO)  { 

[3] 

if  (level  >  10)  { 

[3] 

if  (?)  I 

[3] 

if  (?)  { 

[3] 

if  (?)  I 

[4] 

NewUnit(); 

[4] 

[4] 

[4] 

[5] 

numUnits  :=  1 ; 

[5] 

[5] 

nUO  :=  F; 

[5] 

nUO  :=  F; 

[6] 

canEnter :=  T; 

[6] 

[6] 

[6] 
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} 
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} 

} 

}  else 

}  else 

}  else 

}  else 

[7] 

canEnter :=  T; 

[7] 

[7] 

[7] 

cE  :=  T; 

[8] 

if  (canEnter) 

[8] 

if  (?) 

[8] 

if  (?) 

[8] 

if  (cE) 

[9] 

if  (numUnits  =  0) 

[9] 

if  (?) 

[9] 

if  (nUO) 

[9] 

if  (nUO) 

[10] 

assert(F); 

[10] 

[10] 

[10] 

else 

else 

else 

else 

[11] 

gotUnit(); 

[11] 

[11] 

[11] 

} 

} 

} 

} 
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Figure  1.5  An  example  program  P  and  its  abstractions  as  Boolean  programs.  The  “•  •  •” 

represents  a  “skip”  or  a  no-op. 

One  simple  abstraction  of  P  is  the  program  Bi  in  Fig.  1.5.  This  program  only  retains 
the  control-flow  structure  of  P  and  all  other  data  is  abstracted  away.  Consequently,  the 
branch  conditions  are  non-deterministic,  meaning  that  the  branch  may  go  either  way  in  an 
execution.  Other  program  statements  are  abstracted  to  a  “skip”  because  the  data  being 
manipulated  by  those  statements  is  not  present  in  B3.  (The  abstract  model  Bi  can  also  be 
thought  of  as  the  control-flow  graph  of  P.)  It  is  easy  to  see  that  B\  over-approximates  P. 
B\  is  very  simple  to  analyze;  however,  it  fails  to  show  that  line  10  is  unreachable  because 
there  is  an  execution  of  Bi  that  reaches  line  10. 

Program  B 2  is  a  more  precise  model  of  P ,  as  compared  to  B±.  It  retains  the  control 
structure  of  P,  but  additionally,  it  keeps  track  of  the  value  of  the  predicate  {numUnits  =  0} 
using  the  Boolean  variable  nUO  as  follows:  if  at  some  point  in  the  execution  of  P,  the 
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predicate  {numUnits  =  0}  holds  (does  not  hold),  then  the  value  of  nUO  in  the  corresponding 
execution  of  B2  will  be  true  (false).  B2  is  still  an  over-approximation  of  P  because  some 
branch  conditions  cannot  be  decided  using  just  the  value  of  nUO.  However,  line  10  is  reachable 
in  B2  as  well. 

Program  B%  is  an  even  more  precise  more  of  P,  as  compared  to  B\.  It  keeps  track  of  two 
predicates:  {numUnits  =  0}  (using  variable  nUO)  and  {canEnter  =  T}  (using  variable  cE). 
B3  is  an  over-approximation  of  P  because  the  variable  level  is  still  abstracted  away.  Line 
10  is  not  reachable  in  P3,  which  proves  that  the  assertion  holds  for  P. 

The  process  of  iterating  through  the  abstract  models  B 1,  B2,  and  P3  is  an  example  of  how 
abstraction  refinement  is  used.2  Each  of  these  three  models  are  Boolean  programs,  which  are 
defined  as  imperative  programs,  possibly  with  procedure  calls,  and  only  Boolean  variables 
or  fixed-size  vectors  of  Boolean  variables  (and  no  heap).  The  process  of  creating  Boolean 
programs  from  ordinary  (executable)  programs  by  keeping  track  of  certain  predicates  is  called 
predicate  abstraction. 

1.2  Challenges  in  Verification  of  Programs 

The  previous  section  gave  an  example  for  the  abstraction  phase  of  program  verification. 
In  this  section,  we  discuss  the  analysis  phase.  The  design  of  the  analysis  phase  depends 
heavily  on  the  kind  of  abstract  model  that  is  created.  We  discuss  two  features  of  programs 
that  are  important  to  retain  in  abstract  models,  but  also  pose  challenges  for  the  analysis 
phase.  The  first  feature  is  procedures  and  procedure  calls.  An  analysis  that  can  precisely 
handle  the  procedural  aspect  of  programs  is  called  an  interprocedural  analysis.  The  second 
feature  is  concurrency. 

1.2.1  Interprocedural  Analysis 

Procedures  are  an  important  feature  of  most  programming  languages  because  they  allow 
for  modular  design  of  programs:  each  procedure  is  meant  to  perform  a  task,  and  they  can 
2See  [7]  for  a  description  of  how  the  SLAM  tool  systematically  creates  these  models. 
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be  put  together  to  implement  more  complex  functionality.  Because  procedures  serve  as  a 
natural  abstraction  mechanism  for  developers  to  organize  their  programs,  it  is  important 
that  they  be  retained  in  the  abstract  model.  However,  this  makes  designing  the  analysis  for 
the  model  more  challenging. 

The  difficulty  posed  by  interprocedural  analysis  is  that  it  requires  precise  reasoning  about 
the  program’s  runtime  stack,  which  can  be  unbounded  in  size  because  of  recursion.  This 
induces  an  infinite  control-state  space,  even  for  Boolean  programs  (i.e.,  when  there  is  no  heap 
and  all  variables  are  Boolcans  or  vectors  of  Booleans).  Thus,  straightforward  techniques  of 
enumerating  the  state  space  of  the  model  do  not  work  in  the  presence  of  recursion.  Even 
in  the  absence  of  recursion,  the  state  space  of  a  model  is  exponential  in  the  maximum  call 
depth  that  can  arise  in  an  execution  of  the  model. 

We  now  describe  some  of  the  common  ways  of  approximating  analysis  of  programs  with 
multiple  procedures,  and  show  how  they  fail  to  prove  even  very  simple  properties  of  programs. 
(For  ease  of  discussion,  we  consider  the  analysis  of  C  programs  directly,  instead  of  an  abstract 
model.) 

Consider  the  program  shown  in  Fig.  1.6(a).  It  consists  of  a  single  recursive  procedure  f  oo 
that  manipulates  the  array  arr.  This  procedure  is  intended  to  operate  in  a  multithreaded 
environment  in  which  other  threads  may  also  be  accessing  arr.  The  programmer  intends 
this  procedure  to  be  free  of  data  races,  i.e.,  no  two  threads  should  be  allowed  to  access 
arr  simultaneously.  This  is  enforced  in  the  program  by  having  the  same  mutex  protect 
accesses  to  the  same  array  element  (m[i]  protects  arr[i]).  Proving  this  invariant  requires 
an  interprocedural  analysis  because  one  must  reason  about  local  variables  stored  on  the  stack 
to  establish  the  invariant  that  in  any  execution,  for  all  activation  records  in  the  runtime  stack, 
1!==  m[l2] .  One  this  is  established,  it  is  easy  to  conclude  that  arr  [12]  is  always  protected 
by  m[l2] . 

There  are  two  common  ways  of  approximating  interprocedural  analysis  of  programs. 
The  program  from  Fig.  1.6(a)  shows  that  neither  is  sufficient.  The  first  option  is  to  inline 
procedures  (similar  to  a  compiler’s  function-inlining  optimization)  until  only  one  procedure 


mutex  m[N] 
int  arr[N] 


1 
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foo(  )  { 

if  (?)  { 

i  =  ... 

11  =  m[i] 

12  =  i 
foo(  ) 
acquire(l1) 
arr[l2]  =  ... 
release(l1) 


} 


} 


(a) 


if(?)  {  bar(  )  { 

li  =  m[i]  §2  =  § 

g  =  i  S  =  0 

}  else  {  } 

l,  =  m[i+1] 
g  =  i+1 

} 

bar(  ) 

^2  =  §2 


(b) 


Figure  1.6  An  example  program.  In  (a)  1 ! and  l2are  local  variables;  in  (5)  g  and  g2  are 

global  variables. 


remains  in  the  program.  Because  foo()  is  recursive,  inlining  does  not  help  here:  one  can 
never  get  to  the  point  where  only  a  single  procedure  remains. 

The  other  option  is  to  “short-circuit”  an  invariant  on  local  variables  across  a  procedure 
call,  i.e.,  project  out  the  local  variables  from  any  invariant  /  that  holds  before  a  procedure 
call  to  obtain  an  invariant  li  on  only  the  local  variables,  and  then  assert  that  li  holds  after 
the  procedure  call.  (For  this  discussion,  we  assume  that  local  variables  of  a  procedure  cannot 
be  accessed  by  any  called  procedure.)  Such  an  approach  would  be  sufficient  for  this  example: 
it  would  establish  that  (li==  m[l2])  holds  before  the  recursive  call  to  foo();  hence,  it  must 
also  hold  after  the  call  because  liand  l2are  local  variables  and  m  is  not  modified  in  foo(). 
This  approach  is  only  a  heuristic  and  may  lead  to  imprecision  when  the  intermediate  invariant 
involves  global  variables  that  can  possibly  be  modified  by  a  called  procedure.  Assume  that 
lines  3  and  4  are  replaced  by  the  snippet  of  code  in  Fig.  1.6(6).  Before  the  call  to  bar(), 
the  invariant  is  I\  =  (li==  m[g]),  whereas  after  the  call,  it  becomes  J2  =  (li==  m[g2]). 
Short-circuiting  the  invariant  I\  across  the  call  to  bar()  produces  an  invariant  that  says 
that  liequals  some  entry  in  m.  This  information  is  insufficient  to  conclude  that  the  program 
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Figure  1.7  Two  different  interleavings  of  the  same  program  trace.  The  first  one  runs 
correctly,  while  the  second  one  may  crash  because  the  memory  dereference  “array [n  —  1]”  is 

out  of  bounds. 

behaves  correctly.  Establishing  I2  after  the  call  to  bar()  requires  an  interprocedural  analysis 
that  tracks  the  invariant  between  l^nd  g  through  bar(). 

1.2.2  Analysis  of  Concurrent  Programs 

As  mentioned  previously,  the  advent  of  multi-core  processors  is  pushing  software  to  be¬ 
come  more  concurrent.  Concurrent  programs  are  not  only  difficult  to  write,  but  are  also 
difficult  to  analyze  and  verify. 

In  industry,  the  most  prevalent  way  of  finding  bugs  in  programs  is  testing,  where  pro¬ 
grams  are  executed  under  fixed  input  for  which  the  output  is  known  a  priori.  If  executing  a 
program  fails  to  produce  the  desired  output,  then  there  must  be  a  bug  in  the  program.  Com¬ 
pared  to  verification,  testing  has  the  disadvantage  of  being  incomplete:  finding  bugs  depends 
crucially  on  choosing  the  right  set  of  inputs  for  the  program.  Furthermore,  the  presence  of 
concurrency  greatly  increases  this  incompleteness  because  it  adds  non- determinism  to  the 
program.  Even  under  fixed  input,  a  program  can  have  a  huge  number  of  behaviors  depending 
on  the  interleaving  that  occurs  between  different  threads.  For  example,  see  Fig.  1.7.  Because 
the  interleavings  are  not  in  the  control  of  the  programmer,  bngs  that  only  arise  on  specific 
interleavings  are  especially  hard  to  find.  This  motivates  the  need  for  verification  tools  that 
can  not  only  prove  properties  for  all  program  inputs,  but  also  prove  them  for  all  interleavings 
that  may  happen  during  program  execution. 
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One  may  wonder  why  the  analysis  of  concurrent  programs  is  considered  harder  than  the 
analysis  of  sequential  programs.  The  answer  is  obvious  in  the  case  of  testing  because  sequen¬ 
tial  programs  can  exhibit  only  one  behavior  with  a  fixed  input,  but  concurrent  programs  may 
exhibit  several  behaviors.  However,  in  terms  of  verification,  the  answer  is  not  that  obvious 
because  the  analysis  of  both  sequential  and  concurrent  programs  has  to  consider  a  possibly 
unbounded  number  of  behaviors  anyway.  Also,  in  general,  the  analysis  of  both  sequential 
and  concurrent  programs  is  undecidable. 

The  answer  lies  in  the  way  we  do  verification.  As  mentioned  before,  verification  has 
two  main  steps:  abstraction  to  an  abstract  model,  and  then  an  analysis  of  the  model.  The 
complication  introduced  by  concurrency  is  that  even  with  very  simple  abstract  models,  the 
presence  of  concurrency  makes  their  analysis  computationally  expensive. 

For  instance,  the  analysis  of  sequential  Boolean  programs  can  be  carried  out  in  time 
linear  in  the  size  of  the  program  (but  exponential  in  the  number  of  variables).  However, 
when,  for  a  concurrent  program  with  procedures,  each  thread  is  abstracted  to  a  Boolean 
program,  the  analysis  is  undecidable,  even  for  two  threads.  The  situation  is  similar  even  in 
the  absence  of  procedures:  the  analysis  of  concurrent  Boolean  programs  without  procedures 
is  PSPACE-complcte,  i.e.,  expected  to  have  a  running  time  exponential  in  the  size  of  the 
programs  (as  opposed  to  linear  in  the  case  of  sequential  Boolean  programs). 

This  result  shows  that  a  finite  abstraction  of  program  data  (into  Boolean  variables)  is 
not  sufficient  to  design  effective  verification  algorithms  for  concurrent  programs.  There  is 
hope  if  the  program  control  is  abstracted,  i.e.,  if  the  procedures  are  abstracted  so  that  the 
resulting  abstract  model  only  has  a  single  procedure,  then  the  abstract  model  can  be  analyzed 
precisely  (albeit  in  exponential  time).  Because  of  this  result,  verification  tools  give  up  on 
precise  handling  of  procedures  while  dealing  with  concurrent  programs;  i.e.,  they  do  not  mix 
interprocedural  analysis  with  concurrency.  This  is  unfortunate  because  precise  handling  of 
procedures  has  proven  to  be  very  useful  for  analysis  of  sequential  programs. 
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1.3  Contributions  and  Organization  of  the  Dissertation 

The  dissertation  makes  several  contributions  in  two  main  directions.  First,  it  gives  new 
algorithms  and  techniques  for  interprocedural  analysis  of  sequential  programs  (Chapters  3, 
4,  and  5).  Second,  it  shows  how  interprocedural  analysis  can  be  carried  out  in  the  presence  of 
concurrency  (Chapters  6  and  7).  Background  material  that  our  work  builds  upon  is  covered 
in  Chapter  2. 

1.3.1  New  Technology  for  Sequential  Programs 

It  is  important  to  choose  an  expressive  abstract  model  so  that  it  can  retain  enough  of 
the  important  aspects  of  the  program  to  be  able  to  prove  the  desired  property.  Boolean 
programs  can  encode  models  with  infinite  state  spaces,  but  the  infiniteness  can  only  come 
from  the  runtime  stack.  It  is  restricted  to  finite  abstractions  of  program  data.  Weighted 
pushdown  systems  (WPDSs)  are  strictly  more  expressive  models  than  Boolean  programs. 
They  can  encode  infinite-state  abstractions  of  data  as  well. 

WPDSs  are  based  on  pushdown  systems  (PDSs),  which  are  essentially  finite-state  ma¬ 
chines  equipped  with  a  stack.  PDSs  are  expressive  enough  to  encode  the  interprocedural 
control  flow  of  a  program  by  using  the  PDS  stack  to  encode  the  runtime  stack  of  the  pro¬ 
gram.  PDSs  can  also  encode  Boolean  programs,  but  the  encoding  is  not  very  efficient:  the 
size  of  a  PDS  encoding  a  Boolean  program  B  will  be  exponential  in  the  number  of  variables 
of  B. 

WPDSs  are  a  generalization  of  PDSs.  WPDSs  extend  PDSs  by  adding  a  general  “black¬ 
box”  abstraction  for  expressing  transformations  of  a  program’s  data  state  (through  weights). 
Thus,  the  common  strategy  of  encoding  a  program  abstraction  as  a  WPDS  is  to  encode  the 
interprocedural  control-flow  graph  (ICFG)  of  the  program  using  a  PDS  and  the  data  transfor¬ 
mations  induced  by  the  program  statements  as  weights.  WPDSs  generalize  other  frameworks 
for  interprocedural  analysis,  such  as  the  Sharir-Pnucli  functional  approach  [88],  as  well  as 
the  Knoop-Steffen  [52]  and  Sagiv-Reps-Horwitz  summary-based  approaches  [84], 
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One  advantage  of  using  WPDSs  is  that  one  can  make  use  of  any  of  several  algorithms 
that  exist  for  analyzing  them  [83].  Thus,  in  order  to  design  a  new  verification  tool,  one 
only  has  to  encode  the  program  as  a  WPDS  and  then  the  analysis  part  is  available  for  free. 
In  particular,  there  are  algorithms  that  compute  the  set  of  all  reachable  states  at  a  given 
program  node  (for  checking  assertions  at  that  node),  using  backward  or  forward  search. 
There  are  also  algorithms  for  computing  a  set  of  witnesses — a  set  of  paths  that  justify 
the  computed  set  of  reachable  states.  Such  witnesses  can  be  used  for  reporting  errors  or 
generating  counterexamples  for  subsequent  abstraction  refinement.  Moreover,  because  they 
are  based  on  pushdown  systems,  WPDSs  can  answer  a  richer  set  of  queries  about  the  model 
than  can  be  answered  by  classical  interprocedural  dataflow-analysis  algorithms  [88,  52,  84], 
which  only  provide  the  ability  to  compute  the  set  of  all  reachable  states  at  a  given  program 
node.  There  are  algorithms  for  WPDSs  that  also  compute  the  set  of  reachable  states  at 
a  given  program  node  and  for  a  given  calling  context  for  that  node,  or  for  a  regular  set 
of  calling  contexts  for  the  node.  In  our  earlier  work,  we  shoued  that  these  queries,  called 
stack- qualified  queries  can  be  useful  in  the  interprocedural  setting  [56]. 

Three  implementations  of  WPDSs  are  publicly  available  [49,  47,  86],  and  all  three  provide 
a  convenient  base  for  implementing  different  analyses.  As  a  programming  abstraction,  these 
systems  offer  several  benefits: 

•  An  analyzer  is  created  by  means  of  a  declarative  specification:  one  specifies  a  weight 
domain,  along  with  an  encoding  of  the  program’s  ICFG  and  a  mapping  of  each  ICFG 
edge  to  a  weight. 

•  They  permit  the  creation  of  libraries  of  reusable  weight  domains,  which  can  also  be 
used  to  create  new  weight  domains  by  means  of  weight-domain  construction  operations 
(pairing,  reduced  product  [26],  tensor  product  [71],  etc.) 

•  They  allow  for  symbolic  analysis.  Encoding  Boolean  programs  as  WPDSs  can  be 
exponentially  more  succinct  than  encoding  them  as  PDSs.  This  happens  because  the 
weights  can  be  encoded  symbolically,  e.g.,  using  BDDs.  In  this  case,  the  analysis  of  the 
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WPDS  using  the  standard  WPDS  reachability  algorithms  corresponds  to  a  symbolic 
analysis  of  the  Boolean  program. 

•  Compared  with  other  tools  that  support  the  creation  of  program  analyzers  from  high- 
level  specifications,  (i)  the  WPDS  implementations  allow  more  sophisticated  abstract 
domains  to  be  used  (such  as  the  domains  for  affine-relation  analysis  [67,  68]),  and  (ii) 
they  also  permit  a  broader  range  of  dataflow-analysis  queries  to  be  posed  (in  particular, 
stack-qualified  queries)  than  is  possible  with  tools  such  as  Banshee  [53]  and  BDDBDDB 
[94]. 

Several  of  the  contributions  of  this  dissertation  are  made  using  WPDSs  as  a  starting 
point;  these  results  all  retain  the  benefits  of  WPDSs  mentioned  above.  PDSs  and  WPDSs 
are  discussed  in  more  detail  in  Chapter  2.  Readers  familiar  with  PDSs  and  WPDSs  may 
skip  reading  this  chapter,  and  use  it  only  as  reference  material. 

The  rest  of  this  section  describes  the  contributions  made  by  this  dissertation. 

First,  we  generalized  the  WPDS  model  to  extended  weighted  pushdown  systems  (EW- 
PDSs).  (This  result  is  presented  in  Chapter  3.)  WPDSs,  while  expressive,  do  not  provide  a 
way  to  model  the  local  variables  of  a  procedure.  EWPDSs  provide  a  way  in  which  a  weight 
only  has  to  describe  the  transformation  on  the  variables  in  scope.  In  addition  to  the  weights, 
merge  functions  can  be  provided  that  take  care  of  the  change  in  scope  across  procedure 
boundaries. 

With  EWPDSs,  it  was  possible  to  build  many  more  applications  than  was  possible  with 
WPDSs.  EWPDSs  have  been  used  for  checking  properties  of  Boolean  programs  (Chapter 
4);  computing  affine  relations  in  x86  programs  (Section  3.5.2);  computing  aliasing  in  a  pro¬ 
gram  with  single-level  pointers  (Section  3.5.3);  and  (as  components  of  model  checkers)  for 
concurrent  programs  [48,  58]  to  model  individual  threads. 

In  our  earlier  work,  we  used  EWPDSs  to  design  a  debugging  application,  called  B Trace 
[56],  which  tries  to  find  the  erroneous  run  of  a  program,  given  certain  data  related  to  the 
error,  like  the  stack  trace  dumped  out  at  a  program  crash.  BTRACE  benehtted  from  being 
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able  to  make  use  of  the  reachability  algorithms  for  EWPDSs.  For  instance,  it  uses  the  ability 
of  EWPDSs  to  answer  stack-qualified  queries  to  obtain  information  associated  with  a  stack 
trace.  It  also  uses  the  witness-tracing  feature  to  reconstruct  the  failing  path. 

The  EWPDS  model  and  some  of  its  applications  are  described  in  Chapter  3. 

Second,  we  showed  how  to  improve  the  fix-point  computation  of  both  WPDSs  and  EW¬ 
PDSs  using  graph-theoretic  techniques.  We  call  the  resulting  algorithm  FWPDS  (the  “F” 
stands  for  “Fast”).  The  previous  algorithms  for  (E) WPDSs  were  based  on  chaotic  iteration 
to  compute  the  hxpoint,  which  is  also  typical  of  other  program-analysis  tools.  We  noticed 
that  adding  direction  to  the  chaotic  iteration  could  improve  things  drastically.  Tarjan  had 
earlier  given  an  efficient  iteration  strategy  for  graphs,  which  applies  to  programs  with  a  single 
procedure  [91,  90].  We  generalized  his  algorithm  to  (E)WPDS.  As  a  result,  FWPDS  applies 
to  programs  with  multiple  procedures. 

FWPDS  applies  to  all  applications  that  use  (E)WPDSs.  FWPDS  resulted  in  median 
speedups  of  1.8x  for  finding  affine-relations  in  x86  programs,  3.6x  for  BTrace,  2.6x  for 
checking  properties  of  Boolean  programs. 

We  also  developed  techniques  for  incremental  analysis,  as  well  as  efficient  counterexample 
generation.  Both  techniques  are  useful  in  program-verification  tools.  FWPDS  is  discussed 
in  Chapter  4. 

Third,  we  showed  how  to  combine  forwards  and  backwards  interprocedural  analysis  to 
compute  what  we  call  error  projections  (Chapter  5).  An  error  projection  is  the  union  of  all 
error  traces  in  a  program. 

Typically,  when  an  analysis  concludes  that  bad  states  are  reachable  in  an  abstract  model, 
but  the  counterexample  is  infeasible  in  the  original  program,  the  model  is  refined  and  the 
search  is  restarted.  The  single  counterexample  is  the  only  information  that  is  carried  forward 
to  the  next  refined  model.  Error  projections  allow  much  more  information  to  be  carried 
forward.  In  particular,  the  part  of  the  model  outside  the  error  projection  is  provably  correct 
because  any  execution  that  travels  outside  the  error  projection  cannot  lead  to  an  error.  Thus, 
the  error  projection  represents  the  smallest  part  of  the  model  that  needs  to  be  refined  (and 


15 


re-analyzed).  Error  projections  can  be  used  for  speeding  up  abstraction  refinement,  as  well 
as  for  reporting  errors  back  to  the  user. 

We  showed  how  to  efficiently  and  precisely  compute  error  projections  when  the  pro¬ 
gram  model  is  an  (E)WPDS.  Our  algorithm  uses  forward  and  backward  reachability  on  the 
(E)WPDS,  which  can  in  turn  be  sped  up  using  FWPDS.  Error  projections  and  algorithms 
for  computing  them  are  discussed  in  Chapter  5. 

1.3.2  New  Technology  for  Concurrent  Programs 

The  dissertation  also  makes  a  significant  contribution  to  the  design  and  implementation  of 
efficient  and  practical  verification  tools  for  concurrent  programs.  We  target  shared-memory 
concurrent  programs,  whose  verification  is  considered  a  challenging  problem  because  of  the 
fine-grained  interactions  that  can  occur  between  threads. 

Earlier  in  the  chapter,  we  mentioned  that  combining  interprocedural  analysis  and  concur¬ 
rency  leads  to  undecidability,  and  as  a  consequence,  most  tools  give  up  precise  interprocedural 
reasoning.  We  make  a  trade-off  in  a  different  direction,  which  is  to  bound  concurrency,  but 
retain  the  interprocedural  aspect. 

We  place  a  bound  on  the  number  of  context  switches  that  can  happen  in  any  execution  of 
a  program.  A  context  switch  is  defined  as  the  transfer  of  control  from  one  thread  to  another 
thread.  We  build  tools  that  perform  verification  under  a  given  context  bound  K ;  i.e.,  they 
can  determine  if  any  bug  exists  in  any  program  execution  with  K  context  switches  or  fewer. 
We  call  verification  under  a  context  bound  context-bounded  analysis  (CBA). 

A  natural  question  to  ask  at  this  point  is  what  abstraction  is  more  useful:  one  that 
gives  up  precise  handling  of  procedures  but  does  not  require  any  bound  on  the  number  of 
context  switches,  or  one  that  can  handle  procedures  but  requires  a  bound  on  the  number  of 
context  switches?  While  one  approach  may  not  be  provably  better  than  the  other,  there  are 
advantages  to  exploring  the  latter  approach: 
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•  For  finite-data  programs,  the  analysis  of  concurrent  programs  without  procedures  is 
PSPACE-complete,  whereas  CBA  of  concurrent  programs  with  procedures  is  only  NP- 
complete  (Chapter  6). 

•  Retaining  procedures  in  the  abstract  model  implies  that  the  sequential  part  of  the  anal¬ 
ysis  remains  precise.  Consequently,  if  the  global  verification  property  requires  strong 
“thread-local”  invariants  to  be  established  in  each  executing  thread,  this  technique  will 
do  well  in  proving  the  global  property.  A  technique  that  does  not  handle  procedures 
would  not  be  able  to  get  off  the  ground,  even  in  the  presence  of  a  context  bound.  One 
such  example  is  shown  in  Fig.  1.6. 

•  Many  program  bugs  can  be  found  in  a  few  context  switches  [77,  78,  70,  59].  KISS  [78] 
showed  how  a  number  of  concurrency  bugs  could  be  found  by  exploring  just  two  context 
switches  and  two  threads.  Furthermore,  Musuvathi  and  Qadeer  [70]  used  an  explicit- 
state  model  checker  on  programs  with  a  closed  environment  (i.e.,  with  fixed  input),  to 
systematically  explore  all  their  interleavings;  this  approach  uncovered  numerous  bugs. 
In  our  work  (Chapter  7),  we  showed  that  most  bugs  can  be  found  in  a  few  context 
switches  even  for  programs  with  an  open  environment  (where  the  input  is  not  fixed). 

•  The  context  bound  can  be  iteratively  increased  to  find  more  bugs.  This  has  the  added 
advantage  of  finding  bugs  in  the  smallest  number  of  context  switches  needed  to  trigger 
them,  which  can  help  in  understanding  the  bug.  Thus,  concurrency  is  added  gradually 
by  increasing  the  context  bound. 

•  The  number  of  context  switches  seems  to  be  good  measure  of  the  “hardness”  of  a  bug. 
A  bug  that  requires  more  context  switches  before  it  is  triggered  can  be  regarded  as 
being  more  complicated  than  a  bug  that  can  be  triggered  in  fewer  context  switches. 
Hence,  CBA  is  “demand-driven”  in  terms  of  concurrency.  The  context-switch  bound 
can  be  iteratively  increased  to  gain  a  greater  degree  of  assurance  about  the  correctness 
of  the  program  or  to  find  more  bugs.  The  analysis  will  incur  higher  costs  only  for 
finding  more  complicated  bugs. 
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The  disadvantage  of  bounding  the  number  of  context  switches  is  that  it  is  unsound  and 
cannot  be  used  to  verify  the  absence  of  bugs.  It  can  only  provide  a  correctness  guarantee 
under  a  bound  on  the  number  of  context  switches,  for  example,  by  proving  that  the  program 
does  not  have  any  bugs  when  fewer  than  10  context  switches  occur. 

There  has  been  work  on  CBA  prior  to  our  work.  Qadeer  and  Rehof  [77]  showed  that  when 
the  number  of  context  switches  is  bounded,  the  reachability  problem  for  Boolean  programs  is 
decidable.  They  also  give  an  algorithm  for  this  problem,  but  its  complexity  is  exponential  in 
the  number  of  context  switches  and  has  not  been  implemented.  Subsequently,  we  showed  that 
CBA  of  Boolean  programs  is  NP-complete  (Chapter  6),  thus  indicating  that  an  algorithm 
with  lower  worst-case  complexity  may  not  be  possible  at  all. 

This  dissertation  makes  two  contributions  towards  realizing  practical  algorithms  for  CBA. 
The  theme  of  each  of  these  contributions  is  that  one  can  take  an  existing  analysis  for  se¬ 
quential  programs  and  automatically  extend  it  to  perform  CBA. 

Result  1  (Chapter  6):  We  show  that  if  each  thread  is  modeled  using  a  WPDS  then  CBA 
is  decidable,  and  also  give  an  algorithm  for  performing  CBA.  This  result  requires  one  extra 
property  on  the  weights:  they  must  have  a  tensor  operation.  We  also  showed  that  this 
operation  exists  for  a  large  class  of  abstractions  Atensor,  which  includes  finite-state  ones,  such 
as  the  one  required  for  encoding  Boolean  programs,  as  well  as  infinite-state  abstractions,  such 
as  the  one  required  for  affine-relation  analysis.  Our  result  generalizes  the  work  of  Qadeer 
and  Rehof  to  a  larger  class  of  abstractions  (and  does  so  using  much  different  techniques). 

The  significance  of  our  result  is  that  one  only  needs  to  show  two  properties  to  auto¬ 
matically  obtain  an  algorithm  for  CBA:  (i)  the  abstraction  satisfies  (or  approximates)  the 
properties  required  by  a  WPDS;  and  (n)  the  abstraction  belongs  to  the  class  Atensor,  he., 
an  appropriate  tensor  operation  exists.  Neither  of  these  requires  any  concurrency-related 
reasoning. 

We  obtained  this  result  in  two  steps.  First,  we  showed  that  all  behaviors  of  a  WPDS 
can  be  captured  using  a  weighted  transducer.  A  transducer  is  like  a  finite-state  machine, 
but  has  an  output  tape  as  well.  A  weighted  transducer,  additionally,  produces  a  weight 
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for  every  string  that  it  writes  on  the  output  tape.  We  showed  that  one  can  construct  a 
weighted  transducer  r  for  a  thread  T  that  summarizes  the  behavior  of  T  (when  T  does  not 
yield  control  to  other  threads)  in  the  following  sense:  when  a  state  s  of  T  is  written  on  the 
input  tape  of  r,  it  can  write  a  state  s'  on  the  output  tape  with  weight  w  if  and  only  if  the 
net  effect  of  executing  T  from  s  to  s'  is  w.  This  provides  a  strong  characterization  of  the 
set  of  behaviors  of  thread  T.  Second,  we  used  such  thread-summarization  transducers  to 
devise  a  compositional  approach  to  CBA:  CBA  reduces  to  composing  thread-summarization 
transducers  as  many  times  as  the  number  of  context  switches.  We  showed  that  this  can  be 
done  provided  the  weights  have  a  tensor  operation. 

This  work  provided  theoretical  insight  into  CBA.  However,  the  construction  of  the  trans¬ 
ducers  can  be  an  expensive  operation.  We  improved  on  this  by  giving  a  more  direct  way  of 
performing  CBA  that  avoids  the  transducer  construction. 

Result  2  (Chapter  7):  We  showed  that,  given  a  concurrent  program  P  and  a  context 
bound  K,  one  can  create  a  sequential  program  Pk  such  that  the  analysis  of  Pk  is  sufficient 
for  CBA  of  P  under  the  bound  K .  This  reduction  is  a  source-to-source  transformation,  and 
requires  no  assumptions  nor  extra  work  on  the  part  of  the  user,  except  for  the  identification 
of  thread-local  data.  We  implemented  this  technique  for  a  language  that  is  used  to  specify 
Boolean  programs  to  create  the  first  known  implementation  of  CBA.  It  scales  to  programs 
with  shared  state  space  as  large  as  224  states  and  10  context  switches. 

The  key  insight  behind  this  result  is  that  in  a  program  with  two  threads  T\  and  T2, 
execution  proceeds  with  control  alternating  between  the  two  threads:  Ti;  T2;  Tf;  •  •  • .  Con¬ 
current  analysis  is  hard  because  during  the  analysis  of  T2,  one  also  has  to  keep  track  of  the 
local  state  of  Ti,  so  that  it  can  be  restored  when  Ti  resumes  execution.  Keeping  track  of 
multiple  local  states  typically  makes  the  verification  task  expensive  (or  even  undecidable) . 
We  solved  this  by  transforming  the  threads  to  T(  and  T|  so  that  one  only  has  to  analyze 
Tf;  Tf;  •  •  •  ;  T|;  T|;  •  •  • ,  which  involves  no  thread  interleaving.  For  the  program  to  be  able 
have  this  structure,  the  context  switches  have  to  be  simulated.  To  simulate  T\  relinquishing 
control  to  T2,  Tf  simply  guesses  the  effect  that  T2  will  have  on  the  shared  state  and  resumes 
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execution.  It  does  this  K  times,  making  a  total  of  K  guesses.  Next,  control  is  passed  to 
T|  and  it  verifies  whether  the  K  guesses  made  by  T{  were  correct.  If  not,  execution  is 
aborted.  This  ensures  that  any  execution  that  is  not  aborted  is  a  valid  execution  of  the  con¬ 
current  program.  We  showed  that  this  guess-and-check  strategy  can  be  implemented  using 
a  sonrce-to-sonrce  transformation. 

The  program  Pk  is  (i)  nK -times  larger  than  P,  where  n  is  the  number  of  threads,  and 
(ii)  has  K  times  the  number  of  variables  as  P.  The  former  shows  another  salient  feature  of 
our  reduction:  CBA  scales  linearly  with  the  number  of  threads.  The  latter  shows  that  there 
is  no  free  lunch:  the  worst-case  complexity  of  analyzing  sequential  programs  typically  grows 
exponentially  with  the  number  of  variables.  Thus,  the  analysis  of  Pk  scales  exponentially 
with  K,  in  the  worst  case.  This  result  was  expected  because  we  had  earlier  proved  that  CBA 
is  NP-complcte  (when  variables  are  Boolean- valued) . 

We  present  our  conclusions  in  Chapter  8. 
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Chapter  2 

Background:  Abstract  Models  and  Their  Analysis 

This  chapter  discusses  some  common  abstract  models  and  algorithms  for  their  analysis. 
All  of  the  models  are  for  sequential  programs  with  multiple  procedures,  and  their  analyses 
will  be  interprocedural. 

In  Section  2.1,  we  define  the  dataflow  model,  which  has  been  commonly  used  in  the 
compiler  literature.  It  serves  to  lay  down  some  of  the  useful  definitions  and  concepts.  Next,  in 
Section  2.2  and  Section  2.3,  we  define  Boolean  programs  and  pushdown  systems,  respectively, 
which  are  more  popular  in  verification  and  model-checking  communities.  In  Section  2.4, 
we  define  weighted  pushdown  systems  (WPDSs),  which  are  capable  of  encoding  all  of  the 
previous  models  under  certain  conditions.  WPDSs  merge  the  concepts  that  are  common  to 
compilers  and  verification.  We  mostly  present  results  for  the  analysis  of  PDSs  and  WPDSs. 
We  will  build  on  these  results  in  later  chapters. 

While  discussing  the  analysis  of  an  abstract  model,  we  focus  attention  only  on  asser¬ 
tion  checking.  There  are  two  important  class  of  properties  that  one  would  like  to  verify  on 
programs:  safety  properties  and  liveness  properties.  Safety  properties  address  finite  (but 
possibly  unbounded)  behaviors  of  a  program,  e.g.,  memory  safety,  information  flow,  API  us¬ 
age  rules,  etc.  Whereas  liveness  properties  address  infinite  behaviors,  e.g.,  non-termination, 
response  time,  etc.  In  this  dissertation,  we  focus  only  on  safety  properties.  For  such  proper¬ 
ties,  it  is  possible  to  reduce  the  problem  of  checking  them  on  programs  to  assertion  checking 
by  inserting  instrumentation  in  the  program  that  keeps  track  of  all  the  information  relevant 
to  checking  the  property  (similar  to  inlined  reference  monitors  [29]).  For  example,  memory 
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safety  can  be  checked  by  inserting  assertions  that  check  that  a  pointer  is  not  null  right  before 
it  is  dereferenced.  For  array-bounds  checking,  one  can  keep  track  of  the  size  of  each  allocated 
memory  block  and  assert  that  any  access  to  the  array  is  within  the  bound.  An  instance  of 
this  technique  for  finite-state  properties  is  given  in  Section  2.4.3. 

Reducing  the  verification  property  to  assertion  checking  simplifies  the  design  of  the  anal¬ 
ysis  phase.  To  check  an  assertion  at  program  node  n,  one  only  needs  to  find  the  set  of 
reachable  states  at  n,  and  then  check  if  this  set  intersects  with  the  asserted  condition.  This 
can  further  be  simplified  by  converting  an  assert(0)  statement  to  “if(!0)  then  goto  error”, 
and  then  simply  check  if  node  error  is  reachable  or  not. 

To  avoid  overloading  the  term  “state”,  we  may  refer  to  the  instantaneous  state  of  a 
program  as  a  memory  configuration  when  there  is  a  possibility  of  confusion  with  other  terms. 

Notation.  A  binary  relation  on  a  set  S  is  a  subset  of  S  x  S.  If  R\  and  R2  are  binary 
relations  on  S,  then  their  relational  composition,  denoted  by  UR\,  Rfi ,  is  defined  by  {(si,  S3)  | 
3s2  G  S,  (si,  s2)  £  R\i  (s2,  S3)  €  f?2}.  If  R  is  a  binary  relation,  Rl  is  the  relational  composition 
of  R  with  itself  i  times,  and  R°  is  the  identity  relation  on  S.  R*  =  UfiL0Rl  is  the  reflexive- 
transitive  closure  of  R. 

2.1  The  Dataflow  Model 

Dataflow  analysis  is  used  more  broadly  than  just  for  program  verification.  It  is  a  technique 
commonly  used  in  compilers  to  enable  compiler  optimizations.  Irrespective  of  whether  a  ver¬ 
ification  property  is  present  or  not,  dataflow  analysis  is  concerned  with  finding  a  dataflow 
value  associated  with  each  program  node  n  that  summarizes  possible  memory  configura¬ 
tions  whenever  control  reaches  n.  The  dataflow  value  for  n  safely  approximates  (i.e.,  over 
approximates)  the  set  of  memory  configurations  reachable  at  node  n. 

The  dataflow  model  of  a  program  consists  of  the  following  elements: 

•  The  interprocedural  control-flow  graph  (ICFG)  of  the  program. 

•  A  join  semilattice  (V,  U)  with  least  element  _L: 
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—  Elements  of  V  are  called  dataflow  values.  A  dataflow  value  represents  a  set  of 
possible  memory  configurations. 

—  The  join  operator  U  is  used  for  combining  information  obtained  along  different 
paths. 

•  A  value  Vq  E  V  that  represents  the  set  of  possible  memory  configurations  at  the 
beginning  of  the  program. 

•  An  assignment  M  of  dataflow  transfer  functions  (of  type  V  — >  V)  to  the  edges  of  the 
ICFG:  M(e)  E  V  —>  V. 

A  dataflow-analysis  problem  can  be  formulated  as  a  path-function  problem. 

Definition  2.1.1.  A  path  of  length  j  from  node  m  to  node  n  is  a  (possibly  empty)  sequence 
of  j  edges,  denoted  by  [ei,  e2, . . . ,  ef\,  such  that  the  source  of  e i  is  m,  the  target  of  ej  is  n, 
and  for  all  i,  1  <  i  <  j  —  1,  the  target  of  edge  et  is  the  source  of  edge  el+\ . 

The  path  function  pfy  for  path  q  =  [ei,  ei% . . . ,  ef]  is  the  composition,  in  reverse  order,  of 
q1  s  transfer  functions:  pf(/  =  M(efl)  o  ...  o  M(e 2)  o  M(e  1).  The  path  function  for  an  empty 
path  is  the  identify  function  from  V  to  V. 

2.1.1  Join  Over  All  Paths 

In  mtraprocedural  dataflow  analysis,  the  goal  is  to  determine,  for  each  node  n,  the  “join- 
over-  all-paths'"  (JOP)  solution: 


JOPn  =  □  Pf,K), 

q£  Paths  (enter,  n) 

where  Paths(enter,  n)  denotes  the  set  of  paths  in  the  ICFG  from  the  enter  node  to  n  [51]. 
JOPn  represents  a  summary  of  the  possible  memory  configurations  that  can  arise  at  n: 
because  vQ  E  V  represents  the  set  of  possible  memory  configurations  at  the  beginning  of 
the  program,  pfg(uo)  represents  the  contribution  of  path  q  to  the  memory  configurations 


summarized  at  n. 
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The  soundness  of  the  JOPn  solution  with  respect  to  the  programming  language’s  concrete 
semantics  is  established  by  the  methodology  of  abstract  interpretation  [24] : 

•  A  Galois  connection  (or  Galois  insertion)  is  established  to  define  the  relationship  be¬ 
tween  sets  of  concrete  states  and  elements  of  V. 

•  Each  dataflow  transfer  function  M (e)  is  shown  to  overapproximate  the  transfer  function 
for  the  concrete  semantics  of  e. 

In  the  discussion  below,  we  assume  that  such  correctness  requirements  have  already  been 
taken  care  of,  and  we  concentrate  only  on  algorithms  for  determining  dataflow  values  once 
a  dataflow  model  has  been  given. 

2.1.2  Example:  Copy-Constant  Propagation 

This  section  gives  an  example  of  a  dataflow  model  that  can  be  used  to  do  copy- constant 
propagation ,  in  which  only  statements  of  the  form  “x  =  y”  and  “x  =  constant”  are  inter¬ 
preted,  whereas  the  other  statements  are  over- approximated.  The  goal  of  the  analysis  is  to 
find  if  a  variable  always  holds  a  constant  value  at  some  point  in  the  program. 

An  example  ICFG  is  shown  in  Fig.  2.1.  The  dashed  and  dotted  arrows  represent  the  two 
procedure  calls  to  f  and  their  return  back  to  main.  Let  Var  be  the  set  of  all  variables  in  a 
program,  and  let  (ZT,C,U),  where  ZT  =  ZU  {T},  be  the  standard  constant-propagation 
semilattice:  for  all  c  G  Z,  T  □  c;  for  all  Ci,c2  G  Z  such  that  Ci  ^  c2,  Ci  and  c2  are 
incomparable;  and  U  is  the  least-upper-bound  operation  in  this  partial  order.  T  stands 
for  “not-a-constant” .  Let  D  =  ( Env  — >  Env )  be  the  set  of  all  environment  transformers, 
where  an  environment  is  a  mapping  for  all  variables:  Env  =  ( Var  — >  ZT)  U  {T}.  We  use 
_L  to  denote  an  infeasible  environment.  Furthermore,  we  restrict  the  set  D  to  contain  only 
T-strict  transformers,  i.e.,  for  all  d  G  D,  d(_L)  =  _L.  We  can  extend  the  join  operation  to 
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int  a,b,y; 


void  main()  { 
nl :  a  =  5 ; 
n2:  y  =  1; 
n3,n4:  f(); 
n5 :  if  (  .  .  . )  { 
n6 :  a  =  2 ; 
n7,n8:  f(); 

} 

n9  :  .  .  .  ; 

} 


void  f()  { 
nlO:  b  =  a; 
nil:  if  (  .  .  .  ) 
nl2 :  y  =  2 ; 
else 

nl3:  y  =  b; 

} 


Figure  2.1  A  program  fragment  and  its  ICFG.  For  all  unlabeled  edges,  the  environment 
transformer  is  Ae.e.  The  statements  labeled  are  assumed  not  to  change  any  of  the 

declared  variables. 


environments  by  taking  join  componentwise. 

( 


env i  U  env2  =  { 


env i  if  env2  =  _L 

env 2  if  env i  =  _L 

\v.(envi(v)  U  env ^(v))  otherwise 
The  dataflow  transformers  are  shown  as  edge  labels  in  Fig.  2.1.  A  transformer  of  the  form 
Ae.e  [a  i— >  5]  returns  an  environment  that  agrees  with  the  argument  e,  except  that  a  is  bound 
to  5.  The  environment  _L  cannot  be  updated,  and  thus  (Ae.e[a  5])_L  equals  T.  The  initial 
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dataflow  value  is  the  environment  where  all  variables  are  uninitialized:  [y  i— >  T,  a  t— >  T,  b  h- > 

T]- 

2.1.3  Interprocedural  Join  Over  All  Paths 

The  mterprocedural  dataflow  analysis  problem  is  similar  to  the  intraprocedural  one, 
except  that  the  paths  that  are  chosen  in  the  ICFG  must  be  valid  interprocedural  paths,  i.e., 
they  should  have  matching  calls  and  returns.  (The  exact  definition  of  valid  paths  can  be 
found  elsewhere  [88,  84].) 

For  instance,  in  the  ICFG  shown  in  Fig.  2.1,  the  path 
[emain,  nii  n2,  ^3,  e/,  nw,  Tin,  Xf,  7i4, 7i5]  has  matching  calls  and  returns,  and  hence  it  is  a 
valid  path;  the  path  [emain,  ni,  n2,  n3,  e/,  nw,  fin,  Xf,  n8]  is  not  a  valid  path  because  the 
exit-to-return-site  edge  Xf  — >  rig  does  not  correspond  to  the  preceding  call-to-enter  edge 
n3  ->  ef. 

In  interprocedural  dataflow  analysis,  the  goal  shifts  from  finding  the  join-over- all-paths 
solution  to  the  more  precise  “join-over- all-valid-paths"  (JO VP),  or  “context-sensitive”  solu¬ 
tion.  A  context-sensitive  interprocedural  dataflow  analysis  is  one  in  which  the  analysis  of  a 
called  procedure  is  “sensitive”  to  the  context  in  which  it  is  called.  A  context-sensitive  anal¬ 
ysis  captures  the  fact  that  the  results  propagated  back  to  each  return  site  r  should  depend 
only  on  the  memory  configurations  that  arise  at  the  call  site  that  corresponds  to  r.  More 
precisely,  the  goal  of  a  context-sensitive  analysis  is  to  find  the  JO  VP  value  for  nodes  of  the 
ICFG  [88,  52,  84], 

Definition  2.1.2.  The  join-over-all-valid-paths  (JO VP )  value  for  an  ICFG  node  n  is  defined 
as  follows: 

JOVP„  =  □  pfq(v  o), 

q£  VPaths(emain,n ) 

where  VPaths( emam,  n)  denotes  the  set  of  valid  paths  from  the  main  procedure’s  enter  node 


to  n. 
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Although  some  valid  paths  may  also  be  infeasible  execution  paths,  none  of  the  non-valid 
paths  are  feasible  execution  paths.  By  restricting  attention  to  just  the  valid  paths  from 
emain,  we  exclude  some  of  the  infeasible  execution  paths.  In  general,  therefore,  JOVP„ 
characterizes  the  memory  configurations  at  n  more  precisely  than  JOPn. 

Example  2.1.3.  For  the  dataflow  model  described  in  Section  2.1.2,  .JOVPn4  =  [y  T,  a  i— ► 
5,  b  i — *  5]  and  JOVPng  =  [i/ 1 — ^  2,  a  i — ^  2,  b  i — ^  2] .  This  proves,  for  instance,  that  the  variable 
y  holds  the  value  2  whenever  control  reaches  node  ng,  irrespective  of  the  path  taken  to  reach 

Tig. 

2.1.4  Solving  for  JOP 

In  this  section,  we  briefly  sketch  an  algorithm  for  finding  the  JOP  value.  The  JOP  value 
cannot  be  computed  directly  by  using  its  definition  because  it  involves  taking  joins  over  an 
unbounded  number  of  values.  It  is  computed  using  a  fixpoint  iteration. 

For  each  node  n  in  the  ICFG,  let  Xn  be  a  variable  ranging  over  V,  the  set  of  dataflow 
facts.  Initialize  all  such  variables  to  J_.  Next,  repeat  the  following  until  the  values  of  all 
of  the  variables  stop  changing:  choose  any  edge  e  =  ( n ,  m )  in  the  ICFG;  update  Xm  to 
(Xm  U  M(e)(Xn)).  Once  this  iteration  is  finished,  call  the  resulting  value  of  Xn  LFPn  (the 
least  fixpoint  value  at  n). 

In  the  above  algorithm,  the  aspect  of  choosing  an  edge  randomly  among  all  possible 
edges  is  an  instance  of  the  chaotic  iteration  strategy,  where  one  chooses  randomly  from  a 
set  of  possibilities  to  make  progress.  Chapter  4  discusses  ways  of  improving  over  the  chaotic 
iteration  strategy. 

An  early  result  in  dataflow  analysis  shows  that  LFPn  =  JOPn,  provided  that  the  func¬ 
tions  M(e)  are  distributive  for  all  edges  e  [44],  A  function  /  :  V  — >  V  is  distributive  if 
f[y  1UU2)  =  f(v  1)  U  f  (1^2) ,  for  all  V\,V2  €  V.1  Moreover,  the  iteration  in  the  above  algorithm 
will  terminate  if  V  has  no  infinite  ascending  chains  (in  the  partial  order  defined  by  U).  Such 
1We  shall  see  that  the  distributivity  property  is  an  important  requirement  in  later  sections  as  well. 
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a  result,  which  equates  the  ideal  value  (JOP)  with  one  obtained  from  an  iterative  algorithm 
(LFP)  is  called  a  coincidence  theorem. 

A  coincidence  theorem  for  interprocedural  dataflow  analysis  was  given  by  Sharir  and 
Pnueli  [88].  The  dataflow  model  was  extended  to  better  deal  with  local  variables  by  Knoop 
and  Steffen  and  they  also  gave  a  coincidence  theorem  for  this  extended  model  [52],  This 
theorem  is  covered  in  Section  3.3. 

We  do  not  give  an  algorithm  for  JO  VP  here  (it  can  be  found  elsewhere  [88,  84]),  but 
weighted  pushdown  systems  can  encode  dataflow  models,  and  the  algorithms  for  them  also 
provide  algorithms  to  solve  for  JO  VP  on  dataflow  models.  As  we  shall  see  in  later  sections, 
whenever  an  interprocedural  analysis  is  carried  out  (to  compute  a  value  similar  to  the  JO  VP 
value),  the  computation  is  always  done  over  variables  Xn  that  range  over  the  function  space 
V  — »  V,  instead  of  variables  that  range  over  V. 

2.2  Boolean  Programs 

A  Boolean  program  can  be  thought  of  as  a  C  program  with  only  the  Boolean  datatype.  It 
does  not  have  any  pointers  or  heap-allocated  storage.  A  Boolean  program  consists  of  a  finite 
set  of  procedures.  It  has  a  finite  set  of  global  variables,  and  a  finite  set  of  local  variables 
for  each  procedure.  Each  variable  can  only  hold  a  value  from  a  finite  domain.2  Boolean 
programs  are  very  commonly  used  by  model  checkers  [4,  6,  96].  They  are  often  obtained  as 
a  result  of  predicate  abstraction  (Section  1.1.1). 

To  simplify  the  discussion,  we  assume  that  procedures  do  not  have  parameters  (they 
can  be  passed  through  global  variables).  The  variables  in  scope  inside  a  procedure  are  the 
global  variables  and  its  set  of  local  variables.  Fig.  2.2(a)  shows  a  Boolean  program  with  two 
procedures  and  two  global  variables  x  and  y  over  a  finite  domain  V  =  {0,1,...  ,7}. 

Let  G  be  the  set  of  valuations  of  the  global  variables,  and  let  Val,  be  the  set  of  valuations 
of  the  local  variables  of  procedure  i.  The  set  of  global  states  of  a  Boolean  program  is  the  set 

2An  assignment  to  a  variable  v  that  holds  a  value  from  a  finite  domain  can  be  thought  of  a  collection  of 
assignments  to  a  vector  of  Boolean-valued  variables,  namely,  the  collection  of  Boolean-valued  variables  that 
holds  the  encoding  of  ids  value. 


proc  foo 


[x  =  3]  =  {((«!,  u2),  (3,  u2))  |  V1,v2  G  V} 
\x  =  7}  =  {((t>i,i>2),  (7,v2))  |  v1:v2  e  V} 
[y  =  A  =  {((ui,u2),(ui,ui))  I  vuv2  e  V) 


(b) 


Figure  2.2  (a)  A  Boolean  program  with  two  procedures  and  two  global  variables  x  and  y 
over  a  finite  domain  V  =  {0, 1, . . . ,  7}.  (b)  The  (non-identity)  transformers  used  in  the 
Boolean  program.  V\  refers  to  a  value  of  x  and  v2  refers  to  a  value  of  y. 


G,  and  the  set  of  local  states  L  is  defined  as  follows:  a  local  state  consists  of  the  value  of 
the  program  counter,  a  valuation  of  local  variables  from  some  Val, ,  and  the  program  stack 
(which,  for  each  unfinished  call  to  a  procedure  P,  contains  a  return  address  and  a  valuation 
of  the  local  variables  at  the  time  of  the  call  to  P).  Let  Nt  be  the  set  of  all  CFG  nodes  of 
procedure  i.  Then  L  =  (Uj(A^.  x  Valj))+,  i.e.,  elements  of  L  are  a  non-empty  list  of  pairs 
from  the  set  (TV*  x  Vafi)  for  some  i.  For  convenience,  we  write  the  elements  of  L  with  an 
overbar,  e.g.  I ,  and  use  juxtaposition  to  denote  list  concatenation. 

The  effect  of  executing  a  statement  st  of  procedure  i,  denoted  by  [st],  is  a  binary  relation 
on  G  x  Valj  that  describes  how  values  of  variables  in  scope  can  change.  Fig.  2.2(b)  shows 
the  (non-identity)  transformers  used  in  Fig.  2.2(a). 

The  operational  semantics  of  Boolean  programs  is  shown  in  Fig.  2.3.  The  instantaneous 
state  of  a  Boolean  program  is  an  element  of  G  x  L.  The  operational  semantics  define  how  the 
instantaneous  state  can  change  on  the  execution  of  a  single  statement  in  the  program.  Let 
entry (f)  denote  the  entry  node  of  procedure  f,  proc(n )  denote  the  procedure  that  contains 
node  n,  ep(n)  denote  entry (proc(n));  let  exitnode(n)  denote  a  predicate  on  nodes  that  is  true 
when  n  is  the  exit  node  of  its  procedure. 
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n-^+m  £  [st] 

(01,  (Mi)  0  -»■  (#2,  (M2)  0 


Intra 


n  ----1  ■ f  -  ■>  m  e  =  entry(f)  l  e  Valf 
(0i,(Mi)  0  -*•  (^i,(e,Z)  (m,Zi)  Z) 


Call 


exitnode(n) 
(0i,(Mi)  Z)  -*•  (gi,l) 


Return 


Figure  2.3  Operational  semantics  of  a  Boolean  program.  In  the  rule  Call,  node  m  is  the 

return  site  for  the  procedure  call  at  n. 

A  Boolean  program  with  only  global  variables  can  be  thought  of  as  an  instance  of  a 
dataflow  model.  In  particular,  it  is  one  where  a  dataflow  value  is  a  subset  of  G,  U  is  defined 
as  union,  and  the  dataflow  transformer  associated  with  an  edge  (n,  m)  is  [st] ,  where  st  is 
the  statement  on  node  n.  In  this  case,  JOVPn  denotes  the  set  of  all  values  that  the  variables 
can  hold  at  node  n.  Assertion  checking  at  node  n  can  be  done  using  JOVPn.  To  encode 
a  Boolean  program  with  local  variables,  an  extended  dataflow  model  is  needed,  such  as  the 
one  used  in  [52],  or  pushdown  systems  (Section  2.3)  and  their  extensions  (Chapter  3).  The 
analysis  of  Boolean  programs  can  be  carried  out  via  its  encoding  to  other  models,  which  will 
be  discussed  in  Chapter  3.  A  more  direct  way  of  analyzing  Boolean  programs  is  be  discussed 
in  Chapter  7. 

The  advantage  of  using  Boolean  programs  is  that  one  can  encode  branch  conditions  using 
assume  statements.  (An  assume  statement  is  one  that  states  a  condition  but  does  not  change 
the  value  of  any  variable.)  For  instance,  the  statement  x  ==  y  in  the  program  Fig.  2.2(a) 
would  be  associated  with  the  transformer: 


[x  ==  y]  =  {((v,v),(v,v))  |  v  e  V} 


Such  a  statement  would  have  the  effect  of  only  letting  those  states  pass  that  satisfy  the 
condition  x  ==  y.  For  example,  composing  the  transformers  of  each  statement  along  the 
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program  path  x  =  3;  y  =  7;  x  ==  y,  in  order,  leads  to  the  empty  relation,  i.e.,  the  path 
is  infeasible. 

2.3  Pushdown  Systems 

A  pushdown  system  (PDS)  is  similar  to  a  pushdown  automaton  but  does  not  have  an 
input  tape.  It  is  simply  used  to  represent  a  transition  system. 

Definition  2.3.1.  A  pushdown  system  is  a  triple  V  =  (P,  T,  A)  where  P  is  the  set  of 
states  or  control  locations,  T  is  the  set  of  stack  symbols  and  ACPxTxPxT*  is  the  set 
of  pushdown  rules.  A  configuration  of  V  is  a  pair  (p,  u)  where  p  G  P  and  «6P.  A  rule 
r  G  A  is  written  as  (p,  7)  ( p',u ')  where  p,  p'  G  P,  7  G  T  and  v!  G  T* . 

The  rules  of  V  define  a  transition  relation  =^-p  on  the  configurations  of  V  as  follows: 
If  r  =  (p,  7)  c— >•  ( p',u ')  then  (p,  7«,/)  (pju'u")  for  all  u"  G  T*.  Moreover,  if  for  two 
configurations  c  and  d ,  a  G  A*  is  a  rule  sequence  that  transforms  c  to  c' ,  we  say  c  c’ . 

The  set  of  all  rule  sequences  that  transform  c  to  c'  is  denoted  as  paths(c,  d). 

The  reflexive  transitive  closure  of  is  denoted  by  For  a  set  of  configurations  C, 
we  define  pre*v(C )  =  {d  \  3c  €  C  :  d  c}  and  post*v(C)  =  {d  \  3c  G  C  :  c  c'}, 
which  are  just  backward  and  forward  reachability  under  the  transition  relation  =>.  We  drop 
the  subscript  V  when  there  is  no  possibility  of  confusion. 

We  restrict  PDS  rules  to  have  at  most  two  stack  symbols  on  the  right-hand  side.  This 
means  that  for  every  rule  r  G  A  of  the  form  (p,  7)  ( p',u ),  we  have  |w|  <  2.  This  restric¬ 

tion  does  not  decrease  the  power  of  pushdown  systems  because  by  increasing  the  number  of 
stack  symbols  by  a  constant  factor,  an  arbitrary  pushdown  system  can  be  converted  into  one 
that  satisfies  this  restriction  [85]. 

The  standard  approach  for  modeling  program  control  flow  with  a  pushdown  system  is  as 
follows:  P  contains  a  single  state  {p},  V  corresponds  to  program  locations,  and  A  corresponds 
to  transitions  in  the  interprocedural  control-flow  graph  (ICFG),  as  shown  in  Fig.  2.4.  For 
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Rule 

Control  flow  modeled 

ip, u )  ^ 

+  ip, v ) 

Intraprocedural  edge  u  — >  v 

ip,  c )  ^ 

►  iP,  ef  r ) 

Call  to  procedure  /  from  node  c  that  enters 

the  procedure  at  ef  and  returns  to  r 

{P,  xf)  c 

-»•  ip, e) 

Return  from  procedure  /  at  exit  node  Xf 

Figure  2.4  The  encoding  of  an  ICFG’s  edges  as  PDS  rules, 
instance,  the  rules  that  encode  the  ICFG  shown  in  Fig.  2.1  are: 

ip,  emain)  ^  {p,  «l)  (p,  Ub)  ^  ip,  U9)  (p,  Of)  ^  (j),  U10) 

ip,  ni)  ^  {p,  n2)  ip,  n6)  ^  {p,  n7)  (p,  nw)  ^  ip,  nn) 

ip,  n2 )  ^  ip,  n3)  {p,  n7)  ^  (p,  ef  n8)  ip,  nu)  ^  (p,  n12) 

{P,  n3)  ^  ip,  ef  n4)  ip,  n8)  ^  ip,  n9)  (p,  n12)  ^  ip,  xf) 

(p,  nA)  «^->  (p,  n5)  (p,  n9)  «^->  ip,  xmain)  (p,  nu)  ^  ip,  n13) 

(p,  n5)  ^  {p,  n6)  ip,  xmain)  ^  {p,  e)  (p,  m3)  ^  (p,  xf) 

ip,  xf)  ^  ip,  e) 

Under  such  an  encoding  of  a  program,  a  PDS  configuration  can  be  thought  of  as  a  CFG 
node  with  its  calling  context,  i.e.,  the  stack  of  return  addresses  of  unfinished  calls  leading 
up  to  the  node.  A  rule  r  =  ip,  7)  (//,  u),  u  e  T*,  is  called  a  pop  rule  if  |w|  =  0,  and  a  push 

rule  if  |w|  =  2. 

A  PDS  in  which  the  set  P  is  a  singleton  set  is  also  referred  to  as  a  context-free  process 
[16].  The  state  space  P  can  be  expanded  to  use  multiple  states  to  encode  a  finite  abstraction 
of  the  global  variables,  and  the  stack  alphabet  can  be  expanded  to  encode  local  variables 
[85].  This  technique  will  be  discussed  in  more  detail  below. 

Because  the  number  of  configurations  of  a  pushdown  system  is  unbounded,  it  is  useful  to 
use  finite  automata  to  describe  certain  infinite  sets  of  configurations. 

Definition  2.3.2.  If  V  =  (P,  T,  A)  is  a  pushdown  system  then  a  P-automaton  is  a  finite 
automaton  (• Q ,  T,  — >,  P,  P)  where  Q  A  P  is  a  finite  set  of  states,  — >C  Q  x  V  x  Q  is  the 
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transition  relation,  P  is  the  set  of  initial  states,  and  F  is  the  set  of  final  states  of  the 
automaton.  We  say  that  a  configuration  ( p ,  u)  is  accepted  by  a  V-automaton  if  the  automaton 
can  accept  u  when  it  is  started  in  the  state  p  ( written  as  p  *  q,  where  q  G  F).  A  set  of 
configurations  is  called  regular  if  some  V-automaton  accepts  it.  Without  loss  of  generality, 
V- automata  are  restricted  to  not  have  any  transitions  leading  to  an  initial  state. 

An  important  result  is  that  for  a  regular  set  of  configurations  C,  both  post*(C)  and 
pre*(C )  are  also  regular  sets  of  configurations  [15,  85,  9,  30,  33]. 

2.3.1  Encoding  Boolean  programs  using  PDSs 

To  encode  a  Boolean  program  B  using  a  PDS,  the  state  alphabet  P  is  expanded  to  encode 
the  values  of  global  variables,  and  the  stack  alphabet  T  is  expanded  to  encode  the  values  of 
local  variables  [85]. 

Let  Ni  be  the  set  of  CFG  nodes  of  the  ith  procedure  of  B.  Let  G  and  L  be  the  set  of 
global  and  local  states  of  B ,  respectively,  as  defined  in  Section  2.2.  Let  Vah  be  the  set  of 
valuations  of  local  variables  of  the  ith  procedure. 

We  set  P  to  be  G,  and  V  to  be  the  union  of  Nt  x  Val;  over  all  procedures.  (Note  that 
the  set  of  local  states  L  equals  T+.)  The  PDS  rules  for  the  ith  procedure  are  constructed  as 
follows:  (i)  an  intraprocedural  ICFG  edge  u  — >  v  with  statement  st  is  encoded  via  a  set  of 
rules  (g,  ( u,l ))  >  (g',  (v,l')),  for  each  (( g,l ),  ( g’,l '))  G  [st];  (ii)  a  call  edge  c  — >  r  that  calls 

procedure  /,  with  enter  node  e/,  is  encoded  via  a  set  of  rules  (g,  ( c,  / ))  (g,  ( e/,Z0 )  (r, /)), 

for  each  (g,  l)  G  G  x  Vafi  and  Iq  G  Val /;  (iii)  a  procedure  return  at  node  u  is  encoded  via  a 
set  of  rules  (g,  (■ u ,  /))  (g,  e),  for  each  (g,  l)  G  G  x  Val*; 

Linder  such  an  encoding  of  a  Boolean  program  as  a  PDS,  a  configuration  (p,  7172  •  •  •  7„) 
of  the  PDS  is  an  element  of  G  x  L  that  describes  the  instantaneous  state  of  the  Boolean 
program.  The  state  p  encodes  the  values  of  global  variables;  71  encodes  the  current  program 
counter  and  the  values  of  local  variables  in  scope;  and  the  rest  of  the  stack  encodes  the  list  of 
unfinished  calls  with  the  values  of  local  variables  at  the  time  the  call  was  made.  The  reader 
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can  verify  that  the  PDS  transition  relation  (=>)  is  the  same  as  the  single-step  execution 
relation  (— »)  dehned  in  Fig.  2.3. 

2.3.2  Solving  Reachability  on  PDSs  using  Saturation-Based  Algo¬ 
rithms 

In  this  section,  we  show  how  to  compute  post*(C )  and  pre*(C)  for  a  regular  set  of 
configurations  C.3  This  will  serve  to  lay  out  some  of  the  concepts  that  we  will  use  in 
designing  algorithms  for  more  advanced  abstract  models  in  later  chapters. 

The  algorithms  for  computing  post*  and  pre*,  called  poststar  and  prestar ,  respectively, 
take  a  P-automaton  A  as  input,  and  if  C  is  the  set  of  configurations  accepted  by  A,  they 
produce  P-automata  Apost*  and  Apre *  that  accept  the  sets  of  configurations  post*(C )  and 
pre*(C),  respectively  [9,  30,  33].  Both  poststar  and  prestar  can  be  implemented  as  saturation 
procedures ;  i.e.,  transitions  are  added  to  A  according  to  some  saturation  rule  until  no  more 
can  be  added. 

Algorithm  prestar :  Apre*  can  be  constructed  from  A  using  the  following  saturation  rule: 
If  (j),  7)  ( pfw )  and  p'  — ►  q  in  the  current  automaton,  add  a  transition  (p,  7,5). 

This  algorithm  is  based  on  the  intuition  that  if  the  automaton  accepts  a  configuration 
c  and  a  rule  r  allows  the  transition  c'  ^  c,  then  the  automaton  needs  to  accept  c'  as  well: 
If  there  is  an  accepting  path  starting  in  state  q  that  accepts  u,  then  the  automaton  accepts 
the  configuration  c  =  ( pfwu ).  The  rule  r  =  (p,  7)  c— >  ( pfw )  allows  the  transition  c'  c, 
where  c'  =  (p,  yu).  The  addition  of  the  transition  (p,  7,  g)  allows  d  to  be  accepted  by  the 
automaton. 

Termination  of  the  algorithm  follows  from  the  fact  that  the  number  of  states  of  the 
automaton  does  not  increase  (hence,  the  number  of  transitions  is  bounded). 

Algorithm  poststar.  Apost*  can  be  constructed  from  A  by  performing  Phase  I  and  then 
saturating  via  the  rules  given  in  Phase  II: 

3The  material  in  this  section  is  adapted  from  [83]. 
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•  Phase  I.  For  each  pair  (//,  7')  such  that  V  contains  at  least  one  rule  of  the  form 

(p,  7)  (p',  7,7,/),  add  a  new  state  py. 

•  Phase  II  (saturation  phase).  (The  symbol  A  denotes  the  relation  (A)*  A  (A)*.) 

—  If  (p,  7)  (p',e)  G  A  and  p  A  g  in  the  current  automaton,  add  a  transition 

(p',GQ)- 

—  If  (p,  7)  c— >  (p',y)  G  A  and  p  A  g  in  the  current  automaton,  add  a  transition 
—  If  (p,  7)  c— >  (p',  7/7//)  G  A  and  p  A  g  in  the  current  automaton,  add  the  transitions 

(p'n'iPy)  and  (Py,  Y',g). 

This  algorithm  is  based  on  intuition  similar  to  that  for  prestar.  The  difference  is  that 
poststar  adds  more  states  to  the  automaton.  These  states  are  needed  to  accommodate 
configurations  that  are  added  because  of  a  push  rule.  One  has  to  argne  that  reusing  these 
states  for  different  applications  of  (possibly  distinct)  call  rules  is  correct.  The  interested 
reader  is  referred  to  the  original  papers  for  this  proof  [9,  30,  33]. 

Example  2.3.3.  Given  the  PDS  that  encodes  the  ICFG  from  Fig.  2.1  and  the  query  au¬ 
tomaton  A  shown  in  Fig.  2.5(a),  which  accepts  the  language  {(p,  emain)},  poststar  produces 
the  automaton  Apost*  shown  in  Fig.  2.5(b), 

2.3.3  Solving  Pre-Reachability  on  PDSs  using  Context-Free  Gram¬ 
mars 

While  most  implementations  of  PDS  reachability  use  the  saturation-based  algorithms, 
there  are  different  ways  looking  at  the  reachability  problem.4  In  particular,  we  show  how 
context-free  grammars  can  be  used  to  find  not  only  the  set  of  reachable  configurations,  but 
also  the  set  of  paths  (rule  sequences)  that  justify  their  reachability.  In  this  section,  fix  a 
PDS  V  =  (P,  T,  A)  and  a  P-automaton  A  =  ( Q ,  T,  — >,  P,  P). 


4The  material  in  this  section  is  adapted  from  [83]. 
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Figure  2.5  (a)  Automaton  for  the  input  language  of  configurations  {(p,  emam)};  (b) 
automaton  for  post* ({(p,  emam)})  (computed  for  the  PDS  that  encodes  the  ICFG  from 

Fig.  2.1). 


Production 

for  each 

(1) 

PopRuleSeq^  q,)  - 

-»  r 

r  =  (g,7)  {qfe)  G  A 

(2) 

PopRuleSeq^  ^  - 

-»  r  PopRuleSeq^ /rf,q) 

r  =  (p,  7)  ^  ( p V)  G  A  ,qe  P 

(3) 

PopRuleSeq^  iq)  - 

->  r  PopRuleSeq^,  n,  ^ 

PopRuleSeq^,  ^ 

r  =  (p,  7)  (p',  ii')  G  A,  q,  q'  G  P 

Figure  2.6  The  PopRuleSeq  grammar  for  a  PDS  V  =  (P,  T,  A). 


Consider  the  PopRuleSeq  grammar  shown  in  Fig.  2.6.  The  non-terminals  of  the  grammar 
are  PopRuleSeq^ n  ^  for  all  p,q  G  P  and  7  6  T,  and  the  set  of  terminals  is  A.  An  important 
property  of  this  grammar  is  as  follows. 

Lemma  2.3.4.  The  set  of  strings  derived  by  the  non-terminal  PopRuleSeq^  i  of  the  gram¬ 
mar  shown  in  Fig.  2.6  is  exactly  the  set  pathsv((p^) ,  (q,e)). 

The  proof  of  Lem.  2.3.4  follows  quite  easily  from  induction  on  the  length  of  a  rule  se¬ 
quence. 

We  extend  Lem.  2.3.4  to  capture  the  set  of  all  rule  sequences  from  a  given  configuration. 
First,  observe  that  every  rule  sequence  a  G  paths((p\,  7172  •  •  ■'yn),  {p,e))  can  be  decomposed 
as  c  =  <TiCT2---crn  (see  Fig.  2.7)  such  that  a  *  G  paths((pi,'yi),(pi+i,£))  for  1  <  i  <  n,  and 
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(Pi,  71  72  73  •  ■  ■  7n)  (P2,  72  73  •  •  •  7n) 

=^2  (P3,73-”7n) 

(Pn,  7n) 

(Pn+i,e) 

Figure  2.7  A  path  in  the  transition  relation  of  a  PDS  from  the  configuration 
(pi,7i  72  73  ■■•In)  to  the  configuration  (pn+1,e). 

pn+ 1  =  p.  Intuitively,  this  holds  because  for  a  path  to  look  at  72,  it  must  first  pop  off  71, 
and  then  repeat  this  until  the  stack  becomes  empty.  This  implies  the  following  two  results. 

Lemma  2.3.5.  For  any  PDS  V  =  (P,  T,  A),  the  set  pathsv((pi,  7172  •  •  •  7n),  (pn+i, £)) 
equals  the  union  of  the  sets  (pathsv((pi,  71),  (p2,  e)) .pathsv{{p2, 72),  (P3,  s)) 
pathsv((pn,  7n),  (pn+i,  £))),  where  denotes  elementwise  concatenation  of  sets  of 

strings,  over  all  possible  choices  of  P2,P3,  •  •  •  ,pn  £  P- 

Corollary  2.3.6.  Let  C(N)  be  the  language  that  can  be  derived  from  a  non-terminal  N.  For 
a  PDS  V  =  (P,  r,A),  the  set  of  rule  sequences  pathsv((pi,  7172  •  •  •  jn),  {p,  s))  is  the  union 
of  the  sets  {C(PopRuleSeq{pi  ll  p2)).C(PopRuleSeq{:p2ri2  ,p3))  ■  ■  •  C(PopRuleSeq^,n^n^))  over  all 
possible  choices  of  p2,Pz,  ■  ■  •  ,pn  £  P- 

Next,  we  show  that  Cor.  2.3.6  is  sufficient  to  compute  the  pre*  set.  Let  C(A)  be  the  set 
of  configurations  accepted  by  A.  We  absorb  A  into  the  PDS  V  in  order  to  only  consider  a 
single  system  as  follows: 

Definition  2.3.7.  Given  a  PDS  V  =  (P,  T,  A)  and  a  V -automaton  A  =  (Q,  T,  — ^ 7  P,  F), 
their  combined  PDS  VA  is  defined  to  be  (Q,  T,  AU  A'),  where  A'  contains  a  pop  rule  (p,  7) 

(q,  e)  for  every  transition  (p,  7,  q)  in  A. 

The  PDS  PA  can  either  operate  like  P,  by  using  a  rule  in  A,  or  like  A,  by  using  a  rule  in 
A'.  Because  A  has  no  incoming  transitions  to  states  in  P  (Defn.  2.3.2),  a  valid  rule  sequence 
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a  of  VA  (i.e.,  o  G  paths(c,  d)  for  some  configurations  c  and  c')  must  be  from  the  set  A* (A')*, 
i.e.,  it  is  a  sequence  of  rules  from  A  followed  by  a  sequence  of  rules  from  A'.  This  is  because 
when  a  rule  in  A'  is  used,  the  target  configuration  must  have  a  control  state  from  the  set 
Q  —  P,  after  which  a  rule  in  A  cannot  fire,  and  rules  in  A'  keep  the  control  state  in  Q  —  P. 
The  first  phase,  consisting  of  rules  from  A,  is  also  a  valid  rule  sequence  for  V‘,  the  second 
phase,  consisting  of  rules  from  A',  simulates  running  automaton  A.  The  following  lemma  is 
based  on  this  observation. 

Lemma  2.3.8.  Let  V  =  (P,  T,  A)  be  a  PDS  and  A  =  (Q,  T,  — >,  P,  F)  be  a  V -automaton. 
Let  VA  be  their  combined  PDS.  Then  d  c  for  some  c  G  C ,  if  and  only  if  d  (<lfi£) 
for  some  qf  G  F.  Consequently,  pre*v(C(A))  =  projectP(pre*VA({(qf,£ )  \  qf  G  P})),  where 
projectp(S)  consists  of  all  configurations  in  S,  whose  control  state  is  in  P. 

Given  a  configuration  c,  we  can  compute  the  set  of  all  rule  sequences  that  take  c  to  a 
configuration  accepted  by  A  as  follows:  For  each  qf  G  P,  apply  Cor.  2.3.6  to  the  PopRuleSeq 
grammar  of  VA,  with  p  =  qf,  to  obtain  the  set  Sqf  of  all  paths  from  c  to  (qf,  e)  in  VA.  Next, 
take  a  union  of  these  sets  over  all  states  in  P,  and  remove  the  rules  in  A'.  The  resultant  set 
is  the  desired  answer. 

Because  the  set  of  such  accepting  paths  can  be  unbounded,  computing  it  explicitly  may 
not  be  possible.  However,  the  above  technique  does  allow  us  to  get  a  handle  on  the  set  of  all 
accepting  paths.  By  replacing  the  rules  with  other  quantities,  we  can  compute  other  values 
of  interest.  For  instance,  for  weighted  pushdown  systems,  the  rules  are  replaced  by  weights 
to  compute  the  net  effect  of  all  paths  between  two  given  configurations  (Section  2.4.1).  If 
one  is  only  interested  in  the  set  of  reachable  configurations,  then  the  rules  can  be  replaced 
by  e,  leading  to  the  grammar  shown  in  Fig.  2.8,  which  has  some  interesting  properties. 

In  the  PopSeq  grammar,  PopSeq can  derive  £  if  and  only  if  pathsVA((p,  7),  (q,d))  is 
non-empty.  Also  note  the  similarity  of  this  grammar  with  the  saturation-based  algorithm 
prestar  presented  earlier:  each  grammar  production,  which  was  created  because  of  the  rule 
r  G  A,  corresponds  to  an  instance  of  saturation  rule  for  r.  For  example,  in  prestar,  the 
rule  r  =  (p,  7)  (p' ,  7')  dictates  the  following:  if  (p',^',q)  is  a  transition  in  the  current 
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Production 

for  each 

(1) 

PopSeq{q  i  ql)  - 

->  £ 

(9,7)  ^ 

w,e)  e  (AU  A') 

(2) 

PopSeq{p!j!q)  - 

-»•  PopSeq^  tytq) 

(P,  7)  ^ 

(p'  1 1')  e  A,qeQ 

(3) 

PopSeq{prf  q)  - 

-»•  PopS^'a'd) 

PopSeq(q 

(p,  7) 

',7", 9) 

( v'lil ")  e  A,q,q'  e  Q 

Figure  2.8  The  PopSeq  grammar  for  PDS  VA. 


automaton,  then  add  a  transition  (p,  7,  q)\  in  the  grammar,  it  states  that  if  PopSeq^y^ 
can  derive  e,  then  so  can  PopSeq^  qy  This  leads  to  the  following  result. 

Lemma  2.3.9.  The  non-terminal  PopSeq^  ^  of  the  grammar  shown  in  Fig.  2.8  can  de¬ 
rive  e  if  and  only  if  the  transition  ( p ,  7,  q )  exists  in  the  final  automaton  Apre*  produced  by 
prestar(A) . 

Lem.  2.3.9  along  with  Cor.  2.3.6  justifies  the  correctness  of  the  prestar  algorithm,  i.e., 
the  fact  that  C(Apre*)  =  pre* (C(A)) . 

2.3.4  Solving  Post-Reachability  on  PDSs  using  Context-Free  Gram¬ 
mars 

The  situation  is  similar  for  computing  post*  using  context-free  grammars,  but  slightly 
more  complicated.5  The  complication  arises  from  the  fact  that  the  PDS  has  to  be  massaged 
into  a  different  form  before  we  obtain  a  notion  that  is  analogous  to  pop  sequences. 

Let  Qmid  be  a  set  that  contains  a  state  p' ,  for  every  push  rule  (p,  7)  (p',  7VO  in  A. 

Let  Ve  —  (P  U  Qmid,r,  Ae),  where  Ae  contains  every  rule  from  A  with  zero  or  one  stack 
symbols  on  the  right-hand  side;  and  for  every  push  rule  r  =  (p,  7)  >  (p',  7/7//)  €  A,  Ae 

contains  two  rules:  rji (r)  =  (p,  7)  (p'y,  7")  and  r/2 (r)  =  {Py,£)  ^  (p'-il')-  Note  that  this 

allows  the  addition  of  rules  with  no  stack  symbols  on  the  left-hand  side.  Such  rules  can  fire 
without  consuming  the  top  symbol  of  the  stack,  i.e.,  the  rule  (p,  e)  (q,  7)  contributes  to 
the  transition  relation  of  the  PDS  as  follows:  (p,u)  (q^u)  for  every  u  G  T*. 

5 Again,  the  material  in  this  section  is  adapted  from  [83] . 
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Production 

for  each 

(1) 

SameLevelRuleSeq^p,  e  ^  - 

->  PushRuleSeq^  r  r  =  (p,  7) 

+  {p\  e)  E  A,  q  E  Pe 

(2) 

PushRuleSeq^,  ^  q,^ 

->  PushRideSeq^q  i,  q,^  SameLevelRideSeq^p,  e  ^ 

p'  E  P,  q,  4  E  Pe 

(3) 

PushRuleSeq^,  y  ^ 

->•  PushRuleSeq^  r  r  =  (p,  7) 

+  (p',  y>  E  A,  g  E  Pe 

(4) 

PushRuleSeq^,  y  p,  ^ 

^  m  (r)  r  =  (p,  7)  <- 

+  {p',  i  7")  G  a 

(5) 

PushRuleSeq^,  nl„,q) 

PushRuleSeq{piq)  ip{r)  r  =  (77,7}  ^ 

+  (p\  i  1")  E  A,  q  E  Pe 

Figure  2.9  The  PushRuleSeq  grammar  for  PDS  Ve.  The  set  Pe  is  defined  as  (P  U  Qmid) 


The  non-terminals  of  the  grammar  shown  in  Fig.  2.6  derived  a  language  of  rule  sequences, 
each  of  which  could  pop  off  one  symbol  from  the  top  of  the  stack.  Now  we  define  a  grammar 
whose  non-terminals  derive  rule  sequences  that  can  push  one  symbol  on  the  top  of  the  stack. 
This  grammar  is  shown  in  Fig.  2.9. 

Lemma  2.3.10.  The  set  of  strings  derived  by  the  non-terminal  PushRuleSeq^  ^  of  the 
grammar  shown  in  Fig.  2.9  is  exactly  the  set  pathsVe((q,  e) ,  (p,  7)). 

Note  that  any  for  any  rule  sequence  of  Ve  between  two  configurations  (pi,  uf)  and  (p2,  u2) 
such  that  pi,p2  ^  P,  it  must  have  rj2(r)  and  rji (r)  adjacent  to  each  other.  This  is  because 
once  a  state  changes  to  be  one  in  Qmid,  only  a  rule  of  the  form  jg2 (r)  can  fire.  We  can  convert 
a  valid  rule  sequence  ae  E  A*  to  one  in  A*  by  replacing  the  sequence  of  rules  (//1  (r);  p2  (r)) 
in  cre  with  r.  (This  is  possible  because  771  is  invertible.) 

Similar  to  Cor.  2.3.6,  we  can  use  the  PushRuleSeq  grammar  to  find  all  paths  in  the  set 
pathsVe  ( (p,  e)  ,  (pn+ 1 ,  7n+i  ■  ■  ■  7i » ■ 

We  define  the  combined  PDS-Automaton  system  as  follows:  the  PDS  AR/P  is  the  tuple 
(Q  U  Qmid,r,  Ae),  where  Ae  contains  every  rule  from  A  with  zero  or  one  stack  symbols  on 
the  right-hand  side;  for  every  push  rule  (p,  7)  (p',  7' 7"),  Ae  contains  two  rules:  (p,  7) 

(py,  7")  and  ( p'y,£ )  (p',  7');  and  for  every  transition  (5,7,  q')  in  A,  Ae  contains  the  rule 

(q',£)  (q,  7).  A  rule  sequence  in  this  PDS  first  generates  a  configuration  in  the  language 
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Production 

for  each 

(1) 

PushSeq \qa^) 

->  £ 

(q,  i,q') 

G  -> 

(2) 

SameLevelSeq^p,  £  q^  - 

-4  PushSeq{p  i  q) 

(P,  7)  °- 

►  (p't)  G  a ,qeQe 

(3) 

PushSeq^  n,ql) 

-s-  PiishSeq^  y  ^  SameLevelSeq(p,  £  q) 

P1  G  P, 

q,  q'  e  Qe 

(4) 

PushSeq^  Yq) 

->  PushSeq^  q) 

(P,  7)  c- 

>  (p',  i)  G  A,q  G  Qe 

(5) 

PushSeq^y^) 

->  £ 

(p,  7) 

►  {p\i  7")  G  A 

(6) 

PushSeq^ny,tq) 

-4  PushSeq{p  i  q) 

(P,  7) 

>■  (p)  l'  7")  G  A  ,qeQe 

Figure  2.10  The  PushSeq  grammar  for  PDS  ARV.  The  set  Qe  is  defined  as  (Q  U  Qmid) 


of  A  and  then  fires  a  rule  sequence  from  Ve.  The  saturation-based  algorithm  poststar  is  in 
direct  correspondence  with  the  PushRuleSeq  grammar  when  the  rules  are  replaced  with  e. 

Lemma  2.3.11.  The  non-terminal  PushSeq^pi  q)  of  the  grammar  shown  in  Fig.  2.10  can 
derive  e  if  and  only  if  the  transition  (p,  7,  q)  exists  in  the  final  automaton  Apost*  produced  by 
poststar(A ) . 

2.4  Weighted  Pushdown  Systems 

A  weighted  pushdown  system  is  obtained  by  supplementing  a  pushdown  system  with  a 
weight  domain  that  is  a  bounded  idempotent  semiring  [82,  10].  Such  semirings  are  powerful 
enough  to  encode  finite-state  data  abstractions  such  as  the  one  required  to  encode  Boolean 
programs,  as  well  as  infinite-state  data  abstractions,  such  as  copy-constant  propagation  and 
affine- relation  analysis  [60].  The  basic  idea  is  to  use  weights  to  encode  the  effect  that  each 
rule  has  on  the  data  state  of  the  program. 

Definition  2.4.1.  A  bounded  idempotent  semiring  is  a  quintuple  ( D ,  ©,  ©,  0, 1),  where 
D  is  a  set  whose  elements  are  called  weights,  0  and  1  are  elements  of  D,  and  ©  (the  combine 
operation)  and  ®  (the  extend  operation)  are  binary  operators  on  D  such  that 

1.  (D,©)  is  a  commutative  monoid  with  0  as  its  neutral  element,  and  where  ©  is  idem- 
potent.  (D,®)  is  a  monoid  with  the  neutral  element  1. 
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2.  ©  distributes  over  ©7  i.e.,  for  all  a,b,c  E  D  we  have 

a  ©  (b  ©  c)  =  (a  ©  b)  ©  (a  ©  c)  and  (a  ©  b)  ©  c  =  (a  ©  c)  ©  (6  ©  c)  . 

3.  0  is  an  annihilator  with  respect  to  ©7  i.e.,  for  all  a  e  D,  a  ©  0  =  0  =  0  ©  a. 

4-  In  the  partial  order  C  defined  by  Va,  b  E  D,  6  j©  a  iff  a  ©  6  =  a,  there  are  no  infinite 

ascending  chains. 

In  dataflow-analysis  terms,  D  is  a  set  of  dataflow  transformers,  ©  is  join,  ©  is  transformer 
composition,  0  is  the  infeasible  transformer,  and  1  is  the  identity  transformer. 

The  height  of  a  weight  domain  is  defined  to  be  the  length  of  the  longest  ascending  chain 
in  the  domain.  For  simplicity,  when  we  discuss  complexity  results,  we  will  assume  that  the 
height  is  bounded,  but  WPDSs,  and  the  algorithms  in  this  dissertation,  can  also  be  used  in 
certain  cases  when  the  height  is  unbounded  (as  long  as  the  condition  in  Defn.  2.4.1,  item  4 
is  satisfied). 

Definition  2.4.2.  A  weighted  pushdown  system  is  a  triple  W  =  (' P,S,f )  where  V  = 
(P,  T,A)  is  a  pushdown  system,  S  =  (D,©,  ©,  0, 1)  is  a  bounded  idempotent  semiring  and 
f  :  A  — >  D  is  a  map  that  assigns  a  weight  to  each  pushdown  rule. 

Let  a  G  A*  be  a  sequence  of  rules.  Using  /,  we  can  associate  a  value  to  a,  i.e.,  if 
(j  =  [ri, . . . ,  r*;],  then  we  define  v(a)  =  /(rq)  ©  . . .  ©  /(rq).  The  set  of  all  rule  sequences  from 
a  configuration  in  S  to  a  configuration  in  T  is  denoted  as  paths(S ,  T). 

Definition  2.4.3.  Let  VV  =  (V,  S,  f)  be  a  WPDS,  where  V  =  (P,  T,  A),  and  let  S,T  C  PxT* 
be  regular  sets  of  configurations.  The  interprocedural  join- over- all-paths  (IJOP)  value 
IJOP(5',  T)  is  defined  as  0{u(cr)  j  s  t,s  e  S,t  e  T}.  A  set  of  witnesses  lu(S,T )  for 
this  value  is  defined  as  a  finite  set  of  paths  (rule  sequences)  {<Ti,  •  •  •  ,on},<Ji  G  paths(S,T), 
such  that  ©ju(aj)  =  IJOP (S,  T). 

The  IJOP  value  describes  the  net  transformation  that  occurs  when  going  from  one  set  of 
configurations  to  another.  The  set  of  witnesses  gives  a  finite  number  of  paths  that  together 
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justify  the  reported  IJOP  value.  The  WPDS  reachability  problems,  which  compute  the  set 
of  forward  and  backward  reachable  states,  are  defined  as  follows. 

Definition  2.4.4.  Let  W  =  (' P,S,f )  be  a  weighted  pushdown  system,  where  V  =  (P,  T,  A), 
and  let  C  C  P  x  T*  be  a  regular  set  of  configurations.  The  generalized  pushdown  pre¬ 
decessor  problem  GPP(C)  is  to  find  for  each  c  G  P  x  T*; 

5(c,C)  =  IJOP({c},C) 

The  generalized  pushdown  successor  (GPS)  problem  GPS(C)  is  to  find  for  each 
c  G  P  x  T*  ; 

S(C,  c)  =  IJOP (G,{c|) 

If  S  is  the  set  of  initial  configurations  of  a  program  then  GPS(S')  solves  for  the  set  of 
all  reachable  states  in  the  program.  If  T  is  the  set  of  error  configurations  (such  as  ones 
that  trigger  an  assertion  violation),  then  GPP(T)  is  the  set  of  all  states  that  can  lead  to 
an  error  configuration.  Checking  program  safety  reduces  to  checking  that  IJOP (S',  T)  o. 
In  case  it  is  non-0,  u(S,  T)  gives  a  finite  number  of  counterexamples  —  valid  paths  from  a 
configuration  in  S  to  a  configuration  in  T. 

To  illustrate  the  above  definitions,  we  show  how  a  Boolean  program  B  with  only  global 
variables  can  be  encoded  using  a  WPDS  (P,  S,  /).  Let  G  be  the  set  of  global  states  of  B.  The 
weight  domain  S  is  (2GxG,  U, ; ,  0,  id),  where  the  weights  are  relations  (transformers)  on  the 
set  G.  Combine  is  set  union,  extend  is  relational  composition  (composition  of  transformers), 
0  is  the  empty  relation  and  1  =  id  is  the  identity  relation  on  G.  The  ICFG  of  B  is  encoded 
using  the  PDS  V  and  a  statement  st  that  labels  edge  e  of  B  is  encoded  as  the  weight  [st] 
on  the  rule  corresponding  to  e.  An  example  is  shown  in  Fig.  2.11(a). 

The  set  of  all  data  values  that  reach  a  node  n  can  be  calculated  as  follows:  let  S  be 
the  singleton  configuration  consisting  of  the  program’s  enter  node,  and  let  T  be  the  set 
{(p,nu)  |  u  G  r*}.  Let  w  =  IJOP(S', T).  If  w  —  0,  then  the  node  cannot  be  reached. 
Otherwise,  w  captures  the  net  transformation  on  the  global  state  from  when  the  program 
started.  The  range  of  w,  i.e. ,  the  set  {g  G  G  |  3 g'  G  G  :  ( g' ,g )  G  w},  is  the  set  of  valuations 
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Figure  2.11  (a)  A  WPDS  that  encodes  the  Boolean  program  from  Fig.  2.2(a).  (b)  The 
result  of  poststar((p,ni))  and  prestar((p,  riQ)).  The  final  state  in  each  of  the  automata  is 
acc.  (c)  Definitions  of  the  weights  used  in  the  figure. 


that  reach  node  n.  For  example,  in  Fig.  2.11(a),  the  IJOP  weight  to  node  n6  is  the  weight 
wq  shown  in  Fig.  2.11(c).  Its  range  shows  that  either  x  =  3  and  y  =  3,  or  x  =  7  and  y  =  7. 

Because  T  can  be  any  regular  set,  one  can  also  answer  stack-qualified  queries  [83].  For 
example,  the  set  of  values  that  arise  at  node  n  when  its  procedure  is  called  from  call  site  m 
can  be  found  by  setting  T  =  {(p,n  mr  u)  \  u  G  T*},  where  mr  is  the  return  site  for  call  site 
m. 

A  WPDS  with  a  weight  domain  that  has  a  finite  set  of  weights,  such  as  the  one  described 
above  for  Boolean  programs,  can  be  encoded  as  a  PDS.  However,  it  is  often  useful  to  use 
weights  because  they  can  be  symbolically  encoded.  Tools  such  as  Moped  [85]  and  Bebop 
[6]  use  BDDs  [14]  to  encode  sets  of  data  values,  which  allows  them  to  scale  to  a  large  number 
of  variables.  (Using  PDSs  for  Boolean  program  verification,  without  any  symbolic  encoding, 
is  generally  not  a  feasible  approach.) 

Dataflow  analysis  can  also  be  encoded  using  WPDSs.  The  control-flow  is  encoded  using 
a  PDS,  as  done  for  Boolean  programs;  the  dataflow  transformer  associated  an  edge  becomes 
the  weight  associated  with  the  rule  corresponding  to  that  edge;  combine  is  defined  as  join 
of  transformers;  and  extend  is  defined  as  the  reverse  of  function  composition.  In  this  case, 
JOVPn  (Defn.  2.1.2)  is  the  same  as  IJOP({no},  {(p,  n  u)  [  u  e  r*})(uo),  where  no  is  the 
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entry  point  of  the  main  procedure  and  no  is  the  initial  dataflow  value.  This  encoding  is  valid 
under  the  restriction  that  all  the  dataflow  transformers  are  distributive  (over  the  lattice  join) 
and  do  not  have  any  infinite  ascending  chains.  These  are  the  same  restrictions  used  in  other 
work  on  interprocedural  dataflow  analysis  [88]. 

2.4.1  Solving  for  the  IJOP  Value 

There  are  two  algorithms  for  solving  backward  and  forward  reachability  on  WPDSs, 
called  prestar  and  poststar ,  respectively  (in  the  unweighted  case,  these  algorithms  reduce  to 
computing  the  pre*  and  post*  sets  of  configurations).  Sets  of  weighted  configurations  are 
symbolically  represented  using  weighted  automata. 

Definition  2.4.5.  Given  a  WPDS  Yd  =  (fP,S,f),  a  Yd- automaton  A  is  a  V-automaton, 
where  each  transition  in  the  automaton  is  labeled  with  a  weight.  The  weight  of  a  path  in 
the  automaton  is  obtained  by  taking  an  extend  of  the  weights  on  the  transitions  in  the  path 
in  either  a  forward  or  backward  direction,  depending  on  the  context  in  which  the  automaton 
is  used.  The  automaton  is  said  to  accept  a  configuration  (p,  u)  with  weight  w,  denoted  by 
A{ (p,  u)),  if  w  is  the  combine  of  weights  of  all  accepting  paths  for  u  starting  from  state  p  in 
the  automaton.  We  call  the  automaton  a  backward  Yd -automaton  if  the  weight  of  a  path 
is  read  backwards,  and  a  forward  Yd -automaton  otherwise. 

Let  A  be  an  unweighted  automaton  and  C(A)  be  the  set  of  configurations  accepted 
by  it.  Then,  prestar(A)  produces  a  forward  weighted  automaton  Apre*  as  output,  such  that 
Apre*  (c)  =  IJOP({c},  C(A)),  whereas  poststar(A)  produces  a  backward  weighted  automaton 
Ap0St*  as  output,  such  that  Apost*(c )  =  IJOP(£(^4),  {c})  [83].  These  algorithms  are  similar 
to  those  for  PDSs;  they  only  differ  in  the  weight  computations. 

Notation.  In  a  forward  VV-automaton,  we  say  that  p  -A  q  with  weight  w  if  u  —  7172  •  •  •  7 n, 
and  there  are  transitions  (pl ,  7* , pl+ 1 )  with  weight  wy,  where  p  —  p\  and  q  =  pn+ 1,  and 
w  =  wi  ®  w2  ®  ®  wn.  The  same  holds  for  a  backward  automaton,  except  that  w  should 

equal  wn  <g)  •  •  •  (8)  W2  <S>  W\ .  The  operation  of  adding  a  transition  t  with  weight  w  to  a  weighted 
automaton  A  is  carried  out  as  follows:  if  t  does  not  exist  in  A,  then  t  is  simply  added  to 
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the  transition  set  of  A;  otherwise,  if  t  exists  with  weight  w',  then  t’s  weight  is  updated  to 
w  ©  w'. 

Algorithm  prestar.  The  forward  weighted  automaton  Apre*  can  be  constructed  from  A 
using  the  following  saturation  rule:  Ifr  =  ( p ,  7)  (p',  u)  is  a  rule  in  the  PDS  and  p'  A  q  in 
the  current  automaton  with  weight  w,  then  add  a  transition  (p,  7,  q)  to  the  automaton  with 
weight  ( f(r )  ©  w). 

Algorithm  poststar:  The  backward  weighted  automaton  Apost*  can  be  constructed  from  A 
by  performing  Phase  I  and  then  saturating  via  the  rules  given  in  Phase  II: 

•  Phase  I.  For  each  pair  (p',  7')  such  that  V  contains  at  least  one  rule  of  the  form 

{p,  7}  (p',  7,7,/),  add  a  new  state  p'y. 

•  Phase  II  (saturation  phase).  (The  symbol  A  denotes  the  relation  (A)*  A  (A)*.) 

—  If  r  =  (p,  7)  >  (p\  e)  G  A  and  p  A  q  with  weight  w  in  the  current  automaton, 

add  a  transition  (//,  e,  q )  with  weight  w  ©  /(r). 

—  If  r  =  (p,  7)  (p/,7/)  G  A  and  p  A  q  with  weight  w  in  the  current  automaton, 

add  a  transition  (p',  7',  q)  with  weight  w  ©  f(r). 

—  If  {p,  7)  (p',  7/7//)  G  A  and  p  A  q  in  the  current  automaton,  add  the  transitions 

(p',y,Py)  and  (py ,  7",  q)  with  weights  1  and  w  ©  /(r),  respectively. 

An  efficient  implementation  of  this  algorithm  dynamically  maintains  the  epsilon  closure 
of  the  automaton  so  that  weights  under  the  transition  relation  can  be  read  off 
efficiently  [83]. 

Examples  are  shown  in  Fig.  2.11(b).  One  thing  to  note  here  is  how  the  poststar  automaton 
works.  The  procedure  bar  is  analyzed  independently  of  its  calling  context  (i.e.,  without 
knowing  the  exact  value  of  x),  which  generates  the  transitions  between  p  and  pnr.  The 
calling  context  of  bar,  which  determines  the  input  values  to  bar,  is  represented  by  the 
transitions  that  leave  state  pnr.  This  is  how,  for  instance,  the  automaton  records  that  x  =  3 
and  y  =  3  at  node  ns  when  bar  is  called  from  node  TI2 ■ 
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We  now  provide  some  intuition  into  why  one  needs  both  forwards  and  backwards  au¬ 
tomata.  Consider  the  automaton  shown  in  Fig.  2.11(c).  For  the  post  star  automaton,  when 
one  follows  a  path  that  accepts  the  configuration  (p,n8  n4),  the  transition  (p,n8,q)  comes 
before  (q,  n4,acc).  However,  the  former  transition  describes  the  transformation  inside  bar, 
which  happens  after  the  transformation  performed  in  reaching  the  call  site  at  n4  (which  is 
stored  on  (q,  n4,  acc)).  Because  the  transformation  for  the  calling  context  happens  earlier 
in  the  program,  but  its  transitions  appear  later  in  the  automaton,  the  weights  are  read 
backwards.  For  the  prestar  automaton,  the  weight  on  (p,  n4,acc)  is  the  transformation  for 
going  from  n4  to  ne,  which  occurs  after  the  transformation  inside  bar.  Thus,  it  is  a  forwards 
automaton. 

These  saturation-based  algorithms  can  be  applied  on  weighted  automata  as  well.  In  that 
case,  one  can  prove  the  following.  (Define  A(c)  to  be  0  if  A  does  not  accept  c.) 

Lemma  2.4.6.  If  A  is  a  forward-weighted  automaton  and  Apre*  is  the  result  of  running 
prestar  on  A,  then  for  every  configuration  c: 

Apre*  (c)  =©{  v(a)  ®  A{c')  |  a  G  paths(c,c')} 
d 

If  A  is  a  backward-weighted  automaton  and  Apost*  is  the  result  of  running  poststar  on  A, 
then  for  every  configuration  c: 

APost*  (c)  ©{A  c')  ®  v(a)  |  a  G  paths(c' ,  c)} 

d 

The  path_summary  Algorithm 

Once  the  weighted  automata  Apre*  and  Apost*  are  computed,  we  still  need  to  be  able  to 
compute  the  weight  with  which  they  accept  a  particular  configuration,  or  a  set  of  configura¬ 
tions.  Recall  that  A(c)  is  defined  as  the  combine  of  weights  of  all  accepting  paths  for  c.  We 
define  A(C)  =  0{A(c)  |  c  G  C}.  This  allows  one  to  solve  for  IJOP(£,  T)  for  configuration 
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sets  S  and  T  by  computing  either  poststar(S)(T )  or  prestar(T)(S) .  The  algorithm  that  can 
read  off  these  weights  from  a  weighted  automata  is  called  the  patFsummary  algorithm. 

Definition  2.4.7.  For  a  weighted  automaton  A,  the  weight  pathsummary(A)  is  defined  as 
.A(r*),  i.e.,  the  combine  over  the  weights  of  all  accepting  paths  of  A. 

The  weight  A(C)  can  be  computed  as  path-summary  (AD  Ac) ,  where  Ac  is  an  unweighted 
automaton  that  accepts  the  set  of  configurations  C  and  the  intersection  operation  is  carried 
out  as  for  unweighted  automata,  except  that  the  weights  of  A  are  retained. 

The  pathsummary  weight  is  computed  in  the  same  manner  as  intraprocedural  dataflow 
analysis  (Section  2.1.4).  We  restrict  attention  to  forward  weighted  automata.  The  algorithm 
is  similar  for  backward  weighted  automata.  Let  A  be  a  forward  W-automaton;  let  Q  be  its 
set  of  states;  and  F  be  its  set  of  final  states.  Let  l(q )  be  a  weight  associated  with  state  q  G  Q. 
Initialize  l(q)  to  0  for  all  q  G  Q  —  F  and  1  for  q  G  F.  Then  use  the  following  saturation  rule: 
for  a  transition  (p,  7,  g)  with  weight  w,  update  l(p )  to  l(p )  ©  (w  ©  l(q)).  Once  saturation 
finishes,  i.e.,  no  weight  changes,  then  pathsummary(A)  =  l(qo),  where  q0  is  the  unique  initial 
state  of  A. 

Abstract  Grammar  Problems 

Just  as  in  the  case  of  PDSs,  context-free  grammars  can  be  used  to  gain  more  insight  into 
WPDS  reachability  problems.  The  following  presents  a  weighted  problem  on  context-free 
grammars. 

Definition  2.4.8.  [83]  Let  (S,  U)  be  a  join  semilattice.  An  abstract  grammar  over  (S,  U) 
is  a  collection  of  context-free  grammar  productions,  where  each  production  9  has  the  form 
X0  — >  g$(Xi, . . . ,  Xk).  Parentheses,  commas,  and  go  are  terminal  symbols.  Every  production 
6  is  associated  with  a  function  go'.  Sk  — >  S .  Thus,  every  string  a  of  terminal  symbols  derived 
in  this  grammar  denotes  a  composition  of  functions,  and  corresponds  to  a  unique  value  in  S, 
which  we  call  val(a).  Let  C(X)  denote  the  strings  of  terminals  derivable  from  a  non-terminal 
X .  The  abstract  grammar  problem  is  to  compute,  for  each  non-terminal  X,  the  value 
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(1)  h  - 

9i(e) 

9 1  =  w\ 

(3)  h 

-»•  93(h) 

g3  =  \x.w3  ©  x 

(2)  *1  - 

-»•  92  (t2) 

92  =  A X.VJ2  ©  X 

(4)  f3  - 

9i(h) 

57  =  \x.w4,  ®  x 

Figure  2.12  A  simple  abstract  grammar  with  four  productions. 


Production 

for  each 

(1) 

PopSeq^q  i  qi)  - 

9\  =  1 

-»•  9i(e) 

(9  a  a')  e  - 

■*0 

(2) 

PopSeq(pr/>pl)  - 

92  =  f(r) 

+  92(e) 

r  =  (p,  7)  -- 

»  (p'x)  e  A 

(3) 

PopSeq{p  i  q)  - 

93  =  A  x.f(r) 

93(PopSeq[p, 

®  X 

7 ',<?))  r  =  (p,  7)- 

»  (p',y)  e  A,qeQ 

(4) 

PopSeq{p  i  q)  -3 

gi(PopSeq{p,n 

',q')i  PopSeq^q,  y,  ?)) 

(74  =  \x.\y.f(r)  ®  x  ®  y 

r  =  (p,  7)  -- 

»  (p'lil")  e  A ,g,q'  eQ 

Figure  2.13  An  abstract  grammar  problem  for  solving  GPP. 


JOD(A")  =  | _ |ag£(Y)  val(oc).  This  value  is  called  the  join-over-all-derivations  value  for 

X. 

We  define  abstract  grammars  over  the  meet  semilattice  ( D ,  ©),  where  D  is  a  set  of  weights. 
An  example  is  shown  in  Fig.  2.12.  The  non-terminal  t3  can  derive  the  string  a  =  94(93(91)) 
and  val(a )  =  W4  (g>  w3  <g>  w\ . 

The  abstract  grammar  for  solving  GPP  is  shown  in  Fig.  2.13.  The  grammar  has  one 
non-terminal  PopSeqt  for  each  possible  transition  t  £  Q  x  T  x  Q  of  Apre*  ■  It  is  based  on  the 
unweighted  grammar  shown  in  Fig.  2.8,  which  was  shown  to  capture  all  paths  in  a  PDS.  The 
following  lemma  follows  from  Lem.  2.3.9. 

Lemma  2.4.9.  For  a  transition  t  in  the  automaton  that  results  from  running  prestar(A), 
the  weight  on  t  is  exactly  JOD (PopSeqf). 

The  abstract  grammar  for  solving  GPS  is  shown  in  Fig.  2.14.  The  grammar  has  one 
non-terminal  PushSeqt  for  each  possible  transition  t  G  (Q  U  Qmid)  x  T  x  (Q  U  Qmid)  of  Apost*- 
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Figure  2.14  An  abstract  grammar  problem  for  solving  GPS. 


It  is  based  on  the  unweighted  grammar  shown  in  Fig.  2.10.  The  following  lemma  follows 
from  Lem.  2.3.11. 

Lemma  2.4.10.  For  a  transition  t  in  the  automaton  that  results  from  running  poststar(A) , 
the  weight  on  t  is  exactly  JOD (PushSeqf). 

Complexity 

The  following  lemma  states  the  complexity  of  poststar  by  the  algorithm  of  Reps  et  al. 
[83],  which  is  the  same  as  the  one  described  earlier,  but  with  a  few  optimizations.  We  will 
assume  that  the  time  to  perform  an  (8)  and  a  ©  are  the  same,  and  use  the  notation  Os(.)  to 
denote  the  time  bound  in  terms  of  semiring  operations.  The  height  of  a  weight  domain  is 
defined  to  be  the  length  of  the  longest  ascending  chain  in  the  domain. 
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Lemma  2.4.11.  [83]  Given  a  WPDS  with  PDS  V  =  (P,  T,  A),  if  A  =  (Q,  T,  — »,  P,  P)  is  a  V- 
automaton  that  accepts  an  input  set  of  configurations,  poststar  produces  a  backward  weighted 
automaton  with  at  most  \Q\  +  |A|  states  in  time  Os(|P| | A|(|Qo|  +  |A|)P+  |P||A0|P),  where 
Qo  —  Q\P,  Ao  is  the  set  of  all  transitions  leading  from  states  in  Q0,  and  H  is  the  height 
of  the  weight  domain. 

Approximate  Analysis 

Among  the  properties  imposed  by  a  weight  domain,  one  important  property  is  distribu- 
tivity  (Defn.  2.4.1,  item  2).  This  is  a  common  requirement  for  a  precise  analysis,  which 
also  arises  in  various  coincidence  theorems  for  dataflow  analysis  [44,  88,  52],  Sometimes 
this  requirement  is  too  strict  and  may  be  relaxed  to  monotonicity,  i.e.,  for  all  a,b,c  G  D , 
a  <g)  (6  ©  c)  C  (a  <S>  6)  ©  (a  ®  c)  and  (a  0  6)  ®  c  C  (a  ®  c)  ©  (6  ®  c).  In  such  cases,  the  IJOP 
computation  may  not  be  precise,  but  it  will  be  safe  under  the  partial  order  □. 

2.4.2  Weight  Domains 

This  section  gives  a  few  weight  domains  (i.e.,  bounded  idempotent  semirings)  and  the 
kind  of  analysis  they  permit  when  used  in  a  WPDS. 

Definition  2.4.12.  Let  IB  be  the  set  of  Boolean  values  {true,  false}.  The  Boolean  weight 
domain  is  defined  as  (B,  V,  A ,  false,  true). 

A  Boolean  weight  domain  is  the  most  trivial  example  of  a  weight  domain.  A  WPDS  with 
such  a  weight  domain  effectively  reduces  to  its  underlying  PDS  (after  deleting  rules  with  0 
weight):  IJOP(ci,  c 2)  =  true  if  and  only  if  there  is  a  path  in  the  PDS  from  c\  to  C2.  In  this 
dissertation,  when  we  present  an  algorithm  for  WPDSs,  the  same  algorithm  can  be  reworked 
for  PDSs  by  considering  this  weight  domain. 

Definition  2.4.13.  IfG  is  a  finite  set,  then  the  relational  weight  domain  on  G  is  defined 
as  (2°*G,U id):  weights  are  binary  relations  on  G,  combine  is  union,  extend  is  relational 
composition  0  is  the  empty  relation,  and  1  is  the  identity  relation  on  G. 
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This  weight  domain  is  the  one  used  for  encoding  Boolean  programs,  as  shown  in  Section 
2.4.  Weights  in  such  a  semiring  can  be  encoded  symbolically,  using  BDDs  [85].  The  extend 
and  combine  operations  can  be  implemented  efficiently  on  BDDs. 

Definition  2.4.14.  The  minpath  semiring  is  the  weight  domain  M.  =  (N  U 
(ex)},  min,  +,  oo,  0):  weights  are  non-negative  integers  including  “infinity”,  combine  is  min¬ 
imum,  and  extend  is  addition. 

If  all  rules  of  a  WPDS  are  given  the  weight  1  from  this  semiring  (different  from  the 
semiring  weight  1,  which  is  the  integer  0),  then  the  IJOP  weight  between  two  configurations 
is  the  length  of  the  shortest  valid  path  (shortest  valid  rule  sequence)  between  them. 

Another  infinite  weight  domain,  which  is  based  on  the  minpath  semiring,  is  given  in  [56] 
and  was  shown  to  be  useful  for  debugging  programs. 

The  minpath  semiring  can  be  combined  with  a  relational  weight  domain,  for  example, 
to  find  the  shortest  (valid)  path  in  a  Boolean  program  (for  finding  the  shortest  trace  that 
exhibits  some  property). 

Definition  2.4.15.  A  weighted  relation  on  a  set  S,  weighted  with  semiring  ( D ,  ©,  ©,  0, 1), 
is  a  function  from  (S  x  S)  to  D.  The  composition  of  two  weighted  relations  Ri  and  R2  is 
defined  as  (Rp,  R2)(si,  s3)  =  ©{wi  <g)  w2  j  ds2  G  S  :  w\  =  Ri{s1,s2),w2  =  R2(s2,s3)}.  The 
union  of  the  two  weighted  relations  is  defined  as  (R\  U  R2)(si,s2)  =  /?i(si,S2)  ©  R2(si,s2). 
The  identity  relation  is  the  function  that  maps  each  pair  (s,s)  to  1  and  others  to  0.  The 
reflexive  transitive  closure  is  defined  in  terms  of  these  operations,  as  before.  If  — >•  is  a 
weighted  relation  and  ( si,s2,w )  G— >,  then  we  write  .Si  s2. 

Definition  2.4.16.  If  S  is  a  weight  domain  with  set  of  weights  D  and  G  is  a  finite  set, 
then  the  relational  weight  domain  on  (G,S)  is  defined  as  (2GxG^D,  u, ; ,  0,  id):  weights  are 
weighted  relations  on  G  mid  the  operations  are  the  corresponding  ones  for  weighted  relations. 

If  G  is  the  set  of  global  states  of  a  Boolean  program,  then  the  relational  weight  domain 
on  ( G,M )  can  be  used  for  finding  the  shortest  trace:  for  each  rule,  if  R  C  G  x  G  is  the 
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effect  of  executing  the  rule  on  the  global  state  of  the  Boolean  program,  then  associate  the 
following  weight  with  the  rule: 

{9i  92  |  (<7i,<72)  £  R}  U  {()\  g2  |  (^1,^2)  ^  R}- 

Then,  if  w  —  IJOP(C'i,  C2),  the  length  of  the  shortest  path  that  starts  with  global  state  g 
from  a  configuration  in  C\  and  ends  at  global  state  g'  in  a  configuration  in  C2,  is  w(g,g') 
(which  would  be  00  if  no  path  exists).  (Moreover,  if  a  finite-length  path  does  exist,  a  witness 
trace  can  be  obtained  to  identify  the  elements  of  the  path.) 

2.4.3  Verifying  Finite-State  Properties 

In  this  section,  we  give  an  instance  of  how  property  verification  can  be  converted  to  a 
reachability  problem,  which  is  a  very  basic  form  of  assertion  checking.  We  suppose  that  the 
property  is  given  in  the  form  of  a  finite-state  machine  and  the  abstract  model  of  the  program 
is  a  WPDS. 

The  property  is  supplied  as  a  finite-state  automaton  that  performs  transitions  on  ICFG 
nodes.  The  automaton  has  a  designated  error  state,  and  runs  (i.e. ,  ICFG  paths)  that  drive 
it  to  the  error  state  are  considered  potentially  erroneous  program  executions.  For  instance, 
the  automaton  shown  in  Fig.  2.15  can  be  used  to  verify  the  absence  of  null-pointer  derefer¬ 
ences  (for  a  pointer  p  in  the  program)  by  matching  automaton-edge  labels  against  program 
statements  on  ICFG  nodes.  For  example,  we  would  associate  the  transition  label  p  =  NULL 
with  every  ICFG  node  that  has  this  statement.  The  reader  is  referred  to  other  papers  for 
more  examples  of  useful  finite-state  properties  [4,  21]. 

More  formally,  a  program  is  abstracted  to  a  WPDS  IV  =  (V,  S,  /),  where  V  =  (P,  T,  A). 
Let  the  initial  configuration  of  the  program  be  Co-  A  property  automaton  A  is  the  tuple 
(Q,  T,  — >,  q0,  F),  where  Q  is  a  finite  set  of  control  states,  V  is  the  transition  alphabet, 

Q  x  T  x  Q  is  the  transition  relation,  q0  e  Q  is  the  initial  state  and  F  C  Q  is  the  set  of  final 
states.  A  word  (in  F*)  accepted  by  A  is  considered  to  be  an  erroneous  program  execution. 
The  verification  problem  is  to  find  if  there  is  a  path  a  in  IV  with  non-0  weight,  starting  from 
Co,  such  that  the  nodes  visited  by  a  are  in  the  language  of  A. 
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Figure  2.15  A  finite-state  machine  for  checking  null-pointer  dereferences  in  a  program. 
The  initial  state  of  the  machine  is  ,s  i .  The  label  “p  =  &v”  stands  for  the  assignment  of  a 
non-null  address  to  the  pointer  p.  We  assume  that  the  machine  stays  in  the  same  state 
when  it  has  to  transition  on  an  undefined  label. 


This  problem  can  be  solved  by  taking  a  cross-product  of  W  and  A.  Define  the  relation 
C  Q  x  Q  to  be  {((71,(72)  I  (91,7,92)  G— ; ►},  i.e.,  is  the  projection  of  the  transition 
relation  of  A  to  only  those  that  fire  on  7.  Define  a  WPDS  W  =  (V,S',f'),  where  S'  is 
the  relational  weight  domain  on  (Q,S)  (Defn.  2.4.16),  and  f(r)  is  defined  as  follows:  if 
r  =  (p,  7)  (p',u)  then  f(r)  =  {91  q.2  j  (qljq2)  e  i?7}.6 

A  path  a  in  W,  with  weight  w,  is  accepted  by  A  if  and  only  if  the  same  path  in  W'  has 
weight  w'  such  that  (90,  9/,  w)  G  w'  for  some  qj  G  F.  Consequently,  W  has  a  path  with  non-0 
weight,  starting  from  cq,  that  is  accepted  by  A  if  and  only  if  (90,  9/,  w)  G  IJOPw'({co},  T*) 
for  some  qj  G  F ,  and  w  ^  0.  This  shows  that  solving  for  the  IJOP  weight  is  sufficient  to 
verify  finite-state  properties  on  WPDSs. 


6This  construction  is  due  to  David  Melski,  and  was  used  in  an  experimental  version  of  the  Path  Inspector 

[35]. 
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Chapter  3 

Extended  Weighted  Pushdown  Systems 

In  Chapter  2,  we  covered  some  common  abstract  models  and  defined  the  join-over- all- 
paths  (JOP)  value,  and  several  variants  (JOVP  or  IJOP),  on  those  models.  The  JOP 
value,  for  a  node  n,  is  the  net  transformation  over  all  paths  in  the  model  that  reach  n.  From 
this  value,  the  set  of  all  reachable  states  at  n  (in  the  model)  can  be  computed,  and  used 
for  checking  assertions  at  node  n.  Thus,  it  is  desirable  to  compute  the  precise  JOP  value. 
However,  the  definition  of  JOP  is  declarative  in  nature  and  cannot  be  directly  computed 
because  it  involves  combining  the  effect  of  an  unbounded  number  of  paths.  However,  under 
certain  conditions,  the  JOP  value  can  be  computed  precisely. 

Chapter  2  presented  two  results  in  this  direction.  First,  it  presented  the  Kam  and  Ullman 
coincidence  theorem  [44]  (Section  2.1)  that  provides  sufficient  conditions  under  which  the 
JOP  value  can  be  calculated  for  single-procedure  dataflow  models.  This  result  was  extended 
to  multiple- procedure  models  by  Sharir  and  Pnueli  [88].  Second,  the  weighted  pushdown 
system  (WPDS)  model  also  specifies  certain  conditions  (namely,  that  the  weights  should 
come  from  a  bounded-idempotent  semiring),  which  when  satisfied,  imply  that  the  poststar 
and  prestar  algorithms  can  be  used  to  precisely  compute  the  JOP  weight.  However,  with 
all  these  models,  it  is  not  clear  how  to  encode  programs  with  multiple  procedures  and  local 
variables  in  such  a  way  that  all  these  conditions  are  satisfied.  (Note  that  we  only  considered 
examples  without  local  variables  in  Chapter  2.)  In  this  chapter,  we  study  an  abstract  model 
that  provides  a  straightforward  way  of  encoding  programs  with  local  variables,  and  show 
how  to  precisely  compute  JOP  values  in  the  model. 
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In  dataflow  analysis,  this  challenge  of  incorporating  local  variables  was  addressed  by 
Knoop  and  Steffen  [52]  by  extending  Sharir  and  Pnucli’s  coincidence  theorem  to  model 
the  run-time  stack  of  a  program.  Their  work  is  summarized  in  Section  3.3.  Alternative 
techniques  for  handling  local  variables  have  been  proposed  in  [81,  84],  but  these  lose  certain 
relationships  between  local  and  global  variables. 

As  shown  in  Chapter  2,  WPDSs  generalize  dataflow  analysis.  For  instance,  in  interproce¬ 
dural  dataflow  analysis,  the  JO  VP  value  for  a  program  node  represents  the  set  of  all  possible 
reachable  states  at  that  node  regardless  of  its  calling  context.  Using  WPDSs,  one  can  answer 
“stack-qualified  queries”  that  calculate  the  set  of  states  that  can  occur  at  a  program  point 
for  a  given  regular  set  of  calling  contexts.  Moreover,  the  ability  to  represent  reachable  states 
along  with  their  stack  contents  (in  the  form  of  a  weighted  automaton)  will  be  critical  in 
designing  the  algorithms  and  techniques  presented  in  later  chapters. 

However,  as  with  Sharir  and  Pnueli’s  coincidence  theorem,  it  is  not  clear  if  WPDSs 
can  handle  local  variables  accurately.  In  this  chapter,  we  extend  the  WPDS  model  to  the 
Extended- WPDS  or  EWPDS  model,  which  can  accurately  encode  interprocedural  analyses 
on  programs  with  local  variables  and  answer  stack-qualified  queries  about  them.  The  EW¬ 
PDS  model  can  be  seen  as  generalizing  WPDS  in  much  the  same  way  that  Knoop  and  Steffen 
generalized  Sharir  and  Pnucli’s  coincidence  theorem. 

The  contributions  of  the  work  presented  in  this  chapter  can  be  summarized  as  follows: 

•  We  give  a  way  of  handling  local  variables  in  the  WPDS  model.  The  advantage  of  using 
WPDSs  is  that  they  give  a  way  of  calculating  IJOP  weights  that  hold  at  a  program 
node  for  a  particular  calling  context  (or  set  of  calling  contexts).  They  can  also  provide 
a  set  of  witness  program  execution  paths  that  justify  a  reported  dataflow  value. 

•  We  show  that  the  EWPDS  model  is  powerful  enough  to  capture  Knoop  and  Steffen’s 
coincidence  theorem.  In  particular,  this  means  that  we  can  calculate  the  IJOP  value  for 
any  distributive  dataflow-analysis  problem  for  which  the  domain  of  transfer  functions 
has  no  infinite  ascending  chains. 
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•  We  have  extended  the  WPDS  library  [49]  to  support  EWPDSs  and  used  it  in  an 
application  that  calculates  affine  relationships  that  hold  between  registers  in  x86  code 
[3], 

•  We  show  how  to  encode  several  abstract  models  using  EWPDSs.  These  abstract  mod¬ 
els  include  Boolean  programs  (Section  3.5.1),  affine  programs  (Section  3.5.2),  and  pro¬ 
grams  with  single-level  pointers  [61]  (Section  3.5.3).  This  shows  that  the  analysis  of  all 
these  models  can  be  carried  out  using  EWPDS  reachability  algorithms.  It  also  gives 
us  something  new:  a  way  of  answering  stack-qualified  queries  on  all  these  models. 

The  rest  of  this  chapter  is  organized  as  follows:  Section  3.1  defines  the  EWPDS  model. 
Section  3.2  presents  algorithms  to  solve  reachability  queries  in  EWPDSs.  Section  3.3  presents 
Knoop  and  Steffen’s  coincidence  theorem  for  dataflow  analysis  and  shows  that  the  theorem 
can  also  be  obtained  using  EWPDSs.  Section  3.4  presents  experimental  results.  Section  3.5 
presents  various  applications  of  EWPDSs  by  showing  how  different  abstract  models  can  be 
encoded  using  EWPDSs.  Section  3.6  describes  related  work.  Section  3.7  has  proofs  of  the 
theorems  in  this  chapter. 

3.1  Defining  the  EWPDS  Model 

We  start  by  recalling  the  definitions  of  reachability  problems  on  WPDSs. 

Definition  3.1.1.  Let  VV  =  ( V,S,f )  be  a  weighted  pushdown  system,  where  V  =  (P,  T,  A), 
and  let  C  P  x  T*  be  a  regular  set  of  configurations.  The  generalized  pushdown  pre¬ 
decessor  (GPP)  problem  is  to  find  for  each  c  €  P  x  T*: 

<5(c)  =  0{  v(a)  |  a  e  pathsfc ,  c'),  d  e  C  } 

The  generalized  pushdown  successor  (GPS)  problem  is  to  find  for  each  c  G  P  x  T*: 

<5(c)  =  0{  v(a)  |  o  G  paths(c',  c),c'  E  C  } 

We  aim  to  solve  each  of  these  problems  on  the  EWPDS  model  as  well.  These  require 
computing  the  weight  of  a  rule  sequence.  Rule  sequences,  in  general,  represent  interproce¬ 
dural  paths  in  a  program,  and  such  paths  can  have  unfinished  procedure  calls,  e.g.,  when 
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the  path  ends  in  the  middle  of  a  called  procedure.  To  compute  the  weight  of  such  paths,  we 
have  to  maintain  information  about  local  variables  of  all  unfinished  procedures  that  appear 
on  the  path. 

We  allow  for  local  variables  to  be  stored  at  call  sites  and  then  use  special  merge  functions 
to  appropriately  merge  them  with  the  value  returned  by  a  procedure.  Merge  functions  are 
defined  as  follows: 

Definition  3.1.2.  A  function  g  :  D  x  D  — >•  D  is  a  merge  function  with  respect  to  a 
bounded  idempotent  semiring  S  =  (D,  ©,  ©,  0, 1)  if  it  satisfies  the  following  properties. 

1.  Strictness.  For  all  a  G  D,  g( 0,  a)  =  g(a,  0)  =  0. 

2.  Distributivity.  The  function  distributes  over  ©  in  each  argument.  For  all  a,b,c  G  D , 

g(a  ©  b,  c)  =  g(a,  c )  ©  g(b,  c )  and  g(a,  b  ©  c)  =  g(a,  b)  ©  g(a,  c ) 

3.  Path  Extension.1  For  all  a,b,c  G  D,  g(a  ®b,c)  =  a  ©  g(b ,  c). 

For  a  set  of  pushdown  rules  A,  we  use  A;  C  A  to  denote  the  set  of  all  rules  with  i  stack 
symbols  on  the  right-hand  side. 

Definition  3.1.3.  An  extended  weighted  pushdown  system  is  a  quadruple  We  = 
(' P,S ,  f,g )  where  (' P,S ,  f)  is  a  weighted  pushdown  system  and  g  :  A2  — » ►  Q  assigns  a  merge 
function  to  each  rule  in  A2,  where  Q  is  the  set  of  all  merge  functions  on  the  semiring  S.  We 
will  write  gr  as  a  shorthand  for  g(r ). 

Note  that  a  push  rule  has  both  a  weight  and  a  merge  function  associated  with  it.  The 
merge  functions  are  used  to  combine  the  effects  of  a  called  procedure  with  those  made  by 
the  calling  procedure  just  before  the  call.  As  an  example,  refer  to  Fig.  3.1,  which  is  similar 
to  the  dataflow  model  shown  in  Fig.  2.1,  except  that  it  has  local  variables  as  well.  The 
dataflow  values  used  for  this  model  are  environments  of  the  form  Env  =  ( Var  ZT)  U  {T}, 
with  join  (U)  defined  pointwise.  This  model  can  be  encoded  as  an  EWPDS  (V,S,f,g)  as 

lrThis  property  can  be  too  restrictive  in  some  cases;  Section  3.2.3  discusses  how  this  property  may  be 
dispensed  with.  In  most  cases,  however,  the  path-extension  property  does  hold. 
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int  y; 

void  main()  { 
nl :  int  a  =  5 ; 
n2 :  y  =  1; 
n3,n4:  f (a) ; 
n5 :  if  ( .  .  . )  { 
n6 :  a  =  2 ; 
n7,n8:  f (a) ; 

} 

n9 :  . . . ; 

} 

void  f(int  b)  { 
nlO :  if  (  .  .  .  ) 
nil:  y  =  2; 
else 

nl2:  y  =  b; 

} 


Figure  3.1  A  program  fragment  and  its  ICFG.  For  all  unlabeled  edges,  the  environment 
transformer  is  Ae.e.  The  statements  labeled  are  assumed  not  to  change  any  of  the 

declared  variables. 


follows:  the  control-flow  of  the  model  is  encoded  using  the  PDS  V.  The  weight  domain  S  is 
(D,  ©,  <8),  0, 1)  where  D  =  ( Env  — »  Env )  is  the  set  of  all  environment  transformers  that  are 
T-strict,  i.e.,  for  all  d  E  D,  d(_L)  =  _L.  The  semiring  operations  and  constants  are  defined 
as  follows: 

0  =  Ae.T  w\  ©  W2  =  Ae.wi(e)Uw2(e) 

1  =  Ae.e  w\  ®  W2  =  u>2  °  w\ 

The  weights  for  the  PDS  rules  are  the  corresponding  edge  labels  in  Fig.  2.1.  Merge 
functions  are  two-argument  functions  on  weights.  The  merge  function  for  call  site  n 3  will 
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receive  two  environment  transformers:  one  that  summarizes  the  effect  of  the  caller  from  its 
entry  point  to  the  call  site  ( emain  to  n 3)  and  one  that  summarizes  the  effect  of  the  called 
procedure  (ef  to  Xf).  It  then  has  to  output  the  transformer  that  summarizes  the  effect  of  the 
caller  from  its  entry  point  to  the  return  site  (emam  to  n4).  We  define  it  as  follows: 
g(wi,w2 )  =  if  (wi  =  0  or  w2  =  0)  then  0 

else  Ae.e[a  1— >  w1(e)(a),  y  1— >•  (uq  ®  w2)(e)(y)] 

It  copies  over  the  value  of  the  local  variable  a  from  the  call  site,  and  gets  the  value  of  y  from 
the  called  procedure.  Because  the  merge  function  has  access  to  the  environment  transformer 
just  before  the  call,  we  do  not  need  to  pass  the  value  of  local  variable  a  into  procedure  /. 
This  is  achieved  by  the  weight  on  the  call  rule  at  n:i  that  maps  a  to  T.  Moreover,  the  merge 
function  also  ensures  that  the  local  variables  of  f ,  which  are  present  in  weight  w2)  do  not 
get  passed  into  main. 

The  main  change  that  EWPDSs  require  over  WPDSs  is  the  way  the  weight  of  a  path 
(rule  sequence)  is  calculated  because  the  merge  functions  have  to  be  incorporated.  The 
technical  difference,  formalized  below,  is  that  paths  have  to  be  “parsed”  in  EWPDSs  to  find 
matching  calls  and  returns  so  that  the  appropriate  weights  are  calculated  to  be  passed  to 
the  merge  functions.  For  instance,  consider  the  path  [emain,  ni,  n2,  n3,  ej,  nw,  nu,  Xf,  n^.  We 
Erst  need  to  calculate  the  weights  of  the  sub-paths  [e/,  nw,  rin,  Xf]  and  [emain,  ni,  n2,  n3],  and 
then  pass  these  weights  to  the  merge  function.  In  WPDSs,  there  was  no  such  order  imposed 
in  calculating  the  weight  of  a  path. 

To  formalize  this  notion,  we  redefine  the  generalized  pushdown  predecessor  and  successor 
problem  by  changing  how  we  define  the  value  of  a  rule  sequence.  If  cr  e  A*  with  cr  = 
[ri,r2,---  , T'k]  then  let  (r  cr)  denote  the  sequence  [r, rq,  •  •  •  ,  rq c].  Also,  let  [  ]  denote  the 
empty  sequence.  Consider  the  context-free  grammar  shown  in  Fig.  3.2.  as  is  simply  R\. 
ab  represents  a  balanced  sequence  of  rules  that  have  matched  calls  (R2)  and  returns  (_R0) 
with  any  number  of  rules  from  Ai  in  between,  cr.;  is  just  (R2  j  crb)+  in  regular- language 
terminology,  and  represents  sequences  that  increase  stack  height.  crrf  is  (Rq  |  crb)+  and 
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represents  rule  sequences  that  decrease  stack  height.  oa  can  derive  any  rule  sequence.  We 
will  use  this  grammar  to  define  the  value  of  a  rule  sequence. 

(r  G  A0)  as  — >  [  ]  |  Ri  \  as  as  at  — >  R2  \  crh  \  c>i  a* 

( r  G  Ax)  ab  — >  cr s  I  CT b  (Tb  ad  — >  i?0  I  cq,  |  (Jrf  (Td 

(r  G  A2)  |  i?2  Oft  Ro  °a  -»■  Orf  (X; 

Figure  3.2  Grammar  used  for  parsing  rule  sequences.  The  start  symbol  of  the  grammar  is 

Pa- 


R0  -►  r 
i?i  — >  r 
i?2  — >  r 


Definition  3.1.4.  Given  an  EWPDS  >Ve  =  ( V,S,f,g )  and  a  rule  sequence  a  G  A*,  we 
define  its  value  v(a)  by  parsing  cr  according  to  the  grammar  of  Fig.  3.2  and  computing  the 

weight  using  its  derivation  tree  as  follows: 

1.  v(r)  =  f(r)  5.  v(R2  pb  Ro)  =  gR2(l,v(ab)  0  v{R0)) 

2.  v([])  =1  6.  v(ad  ad)  =  v(ad)  ®  v(ad) 


3.  v(as  as)  =  v(as)0v(as) 


7.  v(<Ji  CTi)  =  v((Ti)  0  v((Ti) 


4.  v(ab  ab)  =  v(ab)0v(ab)  8.  v(ad  af)  =  v(ad)  0  v(cq) 

Here  we  have  used  gR2  as  a  shorthand  for  gr  where  r  is  the  terminal  derived  by  R2- 


The  main  thing  to  note  in  the  above  definition  is  the  application  of  merge  functions  on 
balanced  sequences.  The  path-extension  property  of  merge  functions  allow  us  to  compute 
g(wi,  w2)  as  W\  0  g(l,w2).  An  alternative  grammar  is  given  in  Section  3.2.3  when  the  path 
extension  property  does  not  hold.  Because  the  grammar  presented  in  Fig.  3.2  is  ambiguous, 
there  might  be  many  parsings  of  the  same  rule  sequence,  but  all  of  them  would  produce  the 
same  value  because  the  extend  operation  is  associative  and  there  is  a  unique  way  to  balance 
R2 s  with  Rqs. 

The  generalized  pushdown  problems  GPP  and  GPS  for  EWPDSs  are  the  same  as  those 
for  WPDSs  except  for  the  changed  definition  of  the  value  of  a  rule  sequence.  If  we  let  each 
merge  function  be  gr(w i,w2)  =  W\  0>f(r)  0w2,  then  the  EWPDS  reduces  to  a  WPDS.  From 
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now  on,  whenever  we  talk  about  generalized  pushdown  problems  in  this  chapter,  we  mean 
it  in  the  context  of  EWPDSs. 

3.2  Solving  Reachability  Problems  in  EWPDSs 

In  this  section,  we  present  algorithms  to  solve  the  generalized  reachability  problems  for 
EWPDSs.  Let  We  =  ( V,S,f,g )  be  an  EWPDS  where  P  =  (P,  T,  A)  is  a  pushdown  system 
and  S  =  ( D ,  ©,  ©,  0, 1)  is  the  weight  domain.  Let  C  be  a  fixed  regular  set  of  configurations 
that  is  recognized  by  a  P-automaton  A  =  ( Q ,  T,  — »o,  P,  F).  We  will  follow  the  above  notation 
throughout  this  section.  As  in  the  case  of  WPDSs,  we  will  construct  a  weighted  automaton 
that  represents  the  set  of  reachable  configurations  along  with  their  weights.  The  automaton 
will  be  the  same  as  the  automaton  constructed  for  WPDS  reachability  (Section  2.4.1)  except 
for  the  weights.  We  will  not  show  the  calculation  of  witness  annotations  because  they  are 
obtained  in  exactly  the  same  way  as  for  WPDSs  [83].  The  reason  why  the  computation  is 
unchanged  is  because  witnesses  record  the  paths  that  justify  a  weight  and  not  how  the  values 
of  those  paths  were  calculated. 

3.2.1  Solving  GPP 

To  solve  GPP,  we  take  as  input  a  P-automaton  A  that  describes  the  starting  set  of 
configurations.  As  output,  we  create  a  weighted  automaton  Apre*.  The  algorithm  is  based 
on  the  saturation  rule  shown  in  Fig.  3.3.  Starting  with  the  automaton  A,  we  keep  applying 
this  rule  until  it  no  longer  causes  any  changes.  Termination  is  guaranteed  because  there  are 
a  finite  number  of  transitions  and  there  are  no  infinite  ascending  chains  in  a  weight  domain. 
For  each  transition  in  the  automaton  being  created,  we  store  the  weight  on  it  using  function 
l.  The  saturation  rule  is  the  same  as  that  for  predecessor  reachability  in  ordinary  pushdown 
systems,  except  for  the  weights,  and  is  different  from  the  one  for  weighted  pushdown  systems 
only  in  the  last  case,  where  a  merge  function  is  applied. 

Theorem  3.2.1.  The  saturation  rule  shown  in  Fig.  3.3  solves  GPP  for  EWPDSs,  i.e.,  for 
a  configuration  c  =  (p,  7172  •  •  -7 n),  d(c,  C(A)),  defined  as  IJOP({c},  C(A)),  is  Apre*(c). 
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•  If  r  —  ( p ,  7)  (p',  e),  then  update  the  weight  on  t  =  (p,  7 ,p')  to  l(t)  <—  /(t)©/(r). 

•  If  r  —  (p,  7)  (pr,  7')  and  there  is  a  transition  t  =  (p',  y,q),  then  update  the 

weight  on  t’  =  (p,  7,  q)  to  1(f)  Z(t')  ©  (/(r)  ©  /(f)). 


If  r  =  (p,  7)  (p/,7/7//)  and  there  are  transitions  t  =  (p','y',qi)  and  t' 

(qi,  7",  qg),  then  update  the  weight  on  t"  =  (p,  7,  q2)  to 


Z(t")  1(f)  © 


f(r)  ©  /(t)  ©  l(t')  if  q1  P 
gr(  1,  Z(t))  ©  Z(t')  otherwise 


Figure  3.3  Saturation  rule  for  constructing  Apre*  from  A.  In  each  case,  if  a  transition  t 
does  not  yet  exist,  it  is  treated  as  if  /(f)  equals  0. 

A  proof  of  this  theorem  is  given  in  Section  3.7. 

3.2.2  Solving  GPS 

For  this  section,  we  shall  assume  that  we  can  have  at  most  one  rule  of  the  form  (p,  7) 
(p'll'l")  f°r  each  combination  of  p', 7',  and  7".  This  involves  no  loss  of  generality  because 
we  can  replace  a  rule  r  =  (p, 7)  ^  (p' with  two  rules:  (a)  r'  =  (p, 7)  (pr-,l'l") 

with  weight  f(r)  and  merge  function  gr ,  and  ( b )  r"  =  (pr-,7/)  (p/,7/)  with  weight  1,  where 

pr  is  a  new  state.  This  replacement  does  not  change  the  reachability  problem’s  answers. 
Let  lookupPushRule  be  a  function  that  returns  the  unique  push  rule  associated  with  a  triple 
(p',7',7")  if  there  is  one. 

Before  presenting  the  algorithm,  let  us  consider  an  operational  definition  of  the  value  of 
a  rule  sequence.  The  importance  of  this  alternative  definition  is  that  it  shows  the  correspon¬ 
dence  with  the  call  semantics  of  a  program.  For  each  interprocedural  path  in  a  program,  we 
define  a  stack  of  weights  that  contains  a  weight  for  each  unfinished  call  in  the  path.  Elements 
of  the  stack  are  from  the  set  D  x  D  x  A2  (recall  that  A2  was  defined  as  the  set  of  all  push 
rules  in  A),  where  (vo\,W2 , r)  signifies  that  (i)  a  call  was  made  using  rule  r,  (ii)  the  weight 
at  the  time  of  the  call  was  717 ,  and  (iii)  W2  was  the  weight  on  the  call  rule. 
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Let  STACK  =  D.(D  x  D  x  A2)*  be  the  set  of  all  nonempty  stacks  where  the  topmost 
element  is  from  D  and  the  rest  are  from  (Dx  D  x  A2).  We  will  write  an  element  (wi,W2,  r)  E 
D  x  D  x  A2  as  (wi,w2)r.  For  each  rule  r  E  A  of  the  form  (p,  7)  ( p',u ),  u  E  T*,  we  will 
associate  a  function  Jr]  :  STACK  — »  STACK.  Let  S  E  (D  x  D  x  A2)*. 

•  If  r  has  one  symbol  on  the  right-hand  side  (|tt|  =  1),  then  accumulate  its  weight  on 
the  top  of  the  stack. 

M  (w1  S )  =  ((wi  <g>  /(r))  S') 

•  If  r  has  two  symbols  on  the  right-hand  side  (\u\  =  2),  then  save  the  weight  of  the  push 
rule  as  well  as  the  push  rule  itself  on  the  stack  and  start  a  fresh  entry  on  the  top  of 
the  stack. 

H  Oi  S)  =  (I  Oi,/(r))r  S) 

•  If  r  has  no  symbols  on  the  right-hand  side  (\u\  =  0),  then  apply  the  appropriate 
merge  function  if  there  is  a  something  pushed  on  the  stack.  Otherwise,  r  represents  an 
unbalanced  pop  rule  and  simply  accumulate  its  weight  on  the  stack. 

[r]  (wi  {w2,w3)ri  S)  =  ((gri(w2,w1  ®  f(r))  S)  (3.1) 

[rJ  (wi)  =  (wi  ®  / (r ) ) 

Note  that  we  drop  the  weight  w3  of  the  push  rule  ry  when  we  apply  the  merge  function. 
This  is  in  accordance  with  case  5  of  Defn.  3.1.4. 

For  a  sequence  of  rules  a  =  [ri,r2,---  ,rn],  define  [a]  =  |[r2,  •  •  •  ,rn]]  o  [ri].  Let 

flatten  :  STACK  — >  D  be  an  operation  that  computes  a  weight  from  a  stack  as  follows: 

/  flatten  ((  ))  =1 

flatten(wi  S)  =  flatten  (S)  ®  w\ 

flatten1  ((w  1 ,  w2)r  S)  =  flatten  (S)  ®  (wi  <S>w2) 

Example  3.2.2.  Consider  the  rule  sequence  o  corresponding  to  the  path  in  Fig.  3.1  that  goes 
from  emain  to  n3  to  nn  to  xp  If  we  apply  [a]  to  a  stack  containing  just  1,  we  get  a  stack  of 
height  2  as  follows:  [a](l)  =  ((A e.e[y  2])  (Ae.e[a  1 — ►  5, 3/ 1 — >•  1] ,  Ae.e[a  hT,I)h  e(a)])r), 
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where  r  is  the  push  rule  that  calls  procedure  f  at  node  ry .  The  top  of  the  stack  is  the  weight 
computed  inside  procedure  f,  and  the  bottom  of  the  stack  contains  a  pair  of  weights:  the  first 
component  is  the  weight  computed  in  mam  just  before  the  call;  the  second  component  is  just 
the  weight  of  the  call  rule  r.  If  we  apply  the  flatten  operation  on  this  stack,  we  get  the  weight 
A e.e[a  >—>  T,  y  >  2,  b  i— »•  5],  which  is  exactly  the  value  v(a).  When  we  apply  the  pop  rule  d 
corresponding  to  the  procedure  return  at  Xf  to  this  stack,  we  get: 

[ar'KT)  =  [r']  o  [cr]  (T) 

=  (gr(Xe.e[a  ' — 5, 3/ 1 — >•  1],  Xe.e[y  1— >•  2])) 

=  (Ae.e[a  1— >•  5,  y  1— >•  2]) 

Again,  applying  flatten  on  this  stack  gives  us  v(a  d). 

The  following  lemma  formalizes  the  equivalence  between  [a]  and  v(a). 

Lemma  3.2.3.  For  any  valid  sequence  of  rules  a  (a  G  paths(c,d)  for  some  configurations  c 
and  d),  [a]  (1)  =  S  such  that  flatten(S)  =  v(a). 

Corollary  3.2.4.  Let  C  be  a  set  of  configurations.  For  a  configuration  c,  let  5s(c)  C  STACK 
be  defined  as  follows: 

8s(C,c )  =  {[cr](l)  |  a  G  paths(d,c),  d  G  C}. 

Then:  8(C,d),  defined  in  Defn.  2.4.4,  ®{flatten(S)  \  S  G  8s{C,  c)}. 

The  above  corollary  shows  that  Ss(C,c)  has  enough  information  to  compute  S(C,c )  di¬ 
rectly.  To  solve  the  pushdown  successor  problem,  we  take  the  input  P-automaton  A  that 
describes  the  starting  set  of  configurations  and  create  a  weighted  automaton  Apost*  from 
which  we  can  read  off  the  value  of  <5(£(Al),c)  for  any  configuration  c.  The  algorithm  is 
again  based  on  a  saturation  rule.  For  each  transition  in  the  automaton  being  created,  we 
have  a  function  l  that  stores  the  weight  on  the  transition.  Based  on  the  above  operational 
definition  of  the  value  of  a  path,  we  create  Apost*  on  pairs  of  weights,  that  is,  over  the 
semiring  (D  x  D,  ©,  ®,  (0,  0),  (1, 1))  where  ©  and  ©  are  defined  component-wise.  Also,  we 
introduce  a  new  state  for  each  push  rule.  So  the  states  of  Apost*  are  Q  U  Q mid,  where  Qm\d  = 
{p'y  I  (p,  7)  c— ►  ( P YY)  £  A}.  The  saturation  rule  is  shown  in  Fig.  3.4.  To  see  what  the  sat¬ 
uration  rule  does,  consider  a  path  in  Apost *'■  t  —  q±  q2  -A-  ■  ■  ■  qn+1.  As  an  invariant 


65 


of  our  algorithm,  we  have  qx  G  (P  U  Qmid);  92,  •  •  •  ,  9fc  e  Qmid;  arid  9^+1,  •  •  •  ,  9„+i  e  (Q  -  P) 
for  some  0  <  k  <  n  +  1.  This  is  because  of  the  fact  that  we  never  create  transitions  to  a 
state  in  P  or  from  a  state  in  Q  —  P  to  a  state  in  Qmid-  Define  a  new  transition  label  /'(f)  as 
follows: 

l'(p,  7,  q )  =  lookupPushRule(p' ,  7',  7)  if  p  =  p'y 
Another  invariant  of  our  algorithm  is  that  every  transition  f  to  a  state  in  Qm\<\  has  /'(f) 
defined.  Then  the  path  r  describes  the  STACK  vpathfr)  =  (/1  (f  1)  Ife )i'(t2)  "  '  K^k)i'(tk)) 
where  tt  =  (qtl  r)l,  qt+i)  and  lift)  is  the  first  component  projected  out  of  the  weight-pair  /(f). 
This  means  that  each  path  in  Apost *  represents  a  STACK  and  all  the  saturation  algorithm 
does  is  to  make  the  automaton  rich  enough  to  encode  all  STACKs  in  5s  (£(*4),  c)  for  all 
configurations  c.  The  first  and  third  cases  of  the  saturation  rule  can  be  seen  as  applying  [r] 
for  rules  with  one  and  two  stack  symbols  on  the  right-hand  side,  respectively.  Applying  the 
fourth  case  immediately  after  the  second  case  can  be  seen  as  applying  [r]  for  pop  rules.  We 
now  have  the  following  theorem. 

Theorem  3.2.5.  The  saturation  rule  shown  in  Fig.  3.4  solves  GPS  for  EWPDSs.  For  a 
configuration  c  =  ( p,u ),  we  have, 

8{C(A),c)  =  ®{filatten(vpath(at ))  |  at  e  paths(p,u,qf),qf  G  F} 
where  paths(p,u,  97)  denotes  the  set  of  all  paths  of  transitions  in  Apost*  that  go  from  p  to  qf 
on  input  u,  i.e.  p  -A*  qf. 

A  proof  of  this  theorem  is  given  in  Section  3.7. 

An  easy  way  to  compute  the  combine  in  Thm.  3.2.5  is  to  replace  the  annotation  /(f)  on 
each  transition  f  with  Ifit)  <g)  /2(f),  the  extend  of  the  two  weight  components  of  /(f),  and  then 
use  the  patKsummary  algorithm. 

3.2.3  Relaxing  Merge  Function  Requirements 

Defn.  3.1.2  requires  merge  functions  to  satisfy  three  properties.  The  first  requirement 
(strictness)  can  be  easily  satisfied  and  the  second  requirement  of  distributivity  is  essential 
for  saturation  algorithms  to  work  for  the  GPP  and  GPS  problems.  However,  in  some  cases 
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•  If  r  =  (p,  7)  {p',y)  and  there  is  a  transition  t  =  (p,  7,  g)  with  an¬ 
notation  l{t),  then  update  the  annotation  on  transition  t'  =  ( p',Y,q )  to 

lit’)  <-  lit’)  ©  (lit)  ®  (/(r),  I)). 

•  If  r  =  (p,  7)  (p',e)  and  there  is  a  transition  t  =  (p,  7,  g)  with  an¬ 
notation  lit),  then  update  the  annotation  on  transition  t'  =  ( p',e,q )  to 

lit')  <-  Z(f')  ©  (lit)  ®  (f(r),  I)). 

•  If  r  =  (p,  7)  c— >  (p',  7/7//)  and  there  is  a  transition  t  =  (p,  7,  g)  with  annotation  l(t) 
then  let  t'  =  (p',  7',  qp\y),  t"  =  (qp\y,  7",  g)  and  update  annotations  on  them. 

lit’)  <-  Z(t')  ©  (T,I) 

i(t")  -  *(*")©  (*(*)®(T,/(0)) 

•  If  there  are  transitions  t  =  ( p,£,q )  and  f  =  (g,  7',g')  with  annotations  /(f)  = 

( w\,w2 )  and  l{t')  =  ( w3,W4 )  then  update  the  annotation  on  the  transition  t"  = 
(p,  7',  g')  to  /(f")  /(f")  ©  w  where  w  is  defined  as  follows: 

{^9iookupPushRuie{p' ,y (^3>  1)  if  g  = 

lit')  ®  l(t)  otherwise 


Figure  3.4  Saturation  rule  for  constructing  Apost*  from  A.  In  each  case,  if  a  transition  t' 
(or  t")  does  not  yet  exist,  it  is  treated  as  if  Z(f')  (or  l{t"))  equals  (0,0). 

we  might  not  be  able  to  satisfy  the  third  property  of  path-extension  (Section  3.3  presents 
one  such  case).  Let  us  now  consider  what  happens  when  merge  functions  do  not  satisfy  this 
property. 

The  prestar  algorithm  of  Section  3.2.1  (used  for  creating  Apre*)  would  still  be  correct 
because  it  parses  rule  sequences  exactly  as  described  in  Defn.  3.1.4,  but  the  poststar  algo¬ 
rithm  of  Section  3.2.2  (used  for  creating  Apost*)  would  not  work  as  it  utilizes  a  different 


67 


parsing  and  relies  on  the  path-extension  property  for  computing  the  correct  value.  Instead 
of  trying  to  modify  the  poststar  algorithm,  we  introduce  an  alternative  definition  of  the 
value  of  a  rule  sequence  that  is  suited  for  the  cases  when  merge  functions  do  not  satisfy  the 
path-extension  property.  The  definition  involves  presenting  a  slightly  more  complicated  but 
intuitive  grammar  for  parsing  rule  sequences. 

Definition  3.2.6.  Given  an  EWPDS  We  =  ( V,S,f,g )  where  the  merge  functions  do  not 
satisfy  the  path- extension  property,  the  value  of  a  rule  sequence  u  G  A  *  is  calculated  in 
the  same  manner  as  given  in  Defn.  3.1.4,  but  we  change  the  productions  and  valuations  of 
balanced  sequences  as  follows: 

<7b'  — >  [  ]  |  crv  crb/  v(ab>  ab >)  =  v(ab,)  ®  v(abf) 

|  ub  R-2  ab  R0  v(ab  R2  crb  R0)  =  gR2(v(ab),v(ab)  ®v(R0))  (2) 

ab  ->  ab>  as  v(ab>  as)  =  v(abf)  ®  v(as) 

The  value  of  a  rule  sequence  as  defined  above  is  the  same  as  the  value  defined  by 
Defn.  3.1.4  when  merge  functions  satisfy  the  path-extension  property.  In  the  absence  of 
the  property,  we  need  to  make  sure  that  each  occurrence  of  a  merge  function  is  applied  to 
the  weight  computed  in  the  calling  procedure  just  before  the  call  and  the  weight  computed 
by  the  called  procedure.  We  enforce  this  using  Eqn.  (2)  values  that  we  calculate  for  rule  se¬ 
quences  in  Section  3.2.2  also  do  the  same  in  Eqn.  (3.1)[Pg.  63].  This  means  that  Lem.  3.2.3 
still  holds  and  the  poststar  algorithm  correctly  solves  this  more  general  version  of  GPS. 
However,  the  prestar  algorithm  is  closely  based  on  Defn.  3.1.4  and  the  way  that  it  solves 
solves  the  generalized  version  of  GPP  is  not  based  on  the  above  alternative  definition. 

3.3  Knoop  and  Steffen’s  Coincidence  Theorem 

In  this  section,  we  show  how  EWPDSs  can  encode  Knoop  and  Steffen’s  coincidence 
theorem  [52]  about  interprocedural  dataflow  analysis  in  the  presence  of  local  variables.  We 
refer  to  the  IJOP  value  defined  by  Knoop  and  Steffen  as  the  interprocedural-local-join-over- 
all-paths  (LJOP)  value.  (Note  that  we  are  using  different  terminology  than  that  used  in 
[52],  This  is  to  avoid  confusion  with  the  other  terms  defined  in  the  previous  chapters.) 
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We  are  given  a  join  semilattice  (C,  U)  that  describes  dataflow  facts  and  the  interprocedural- 
control-flow  graph  of  a  program  (M,£),  where  Me,  Mr  C  A f  are  the  sets  of  call  and  return 
nodes,  respectively.  We  are  also  given  a  semantic  transformer  for  each  node  in  the  program: 

[  ]  :  M  — >  (C  — >  C),  which  represents  the  effect  of  executing  a  statement  in  the  program. 
Let  STK  =  C+  be  the  set  of  all  nonempty  stacks  with  elements  from  C.  STK  is  used  as  an 
abstract  representation  of  the  run-time  stack  of  a  program.  Define  the  following  operations 
on  stacks. 

newstack  :  C  — >  STK  creates  a  new  stack  with  a  single  element 

push  :  STK  xC-»  STK  pushes  a  new  element  on  top  of  the  stack 

pop  :  STK  — >  STK  removes  the  topmost  element  of  the  stack 

top  :  STK  — >  C  returns  the  topmost  element  of  the  stack 

We  can  now  describe  the  interprocedural  semantic  transformer  for  each  program  node: 

[  ]*  :  M  -»•  (STK  ->  STK).  For  stk  e  STK, 

( 

push(pop(stk) ,  [ n](top(stk)))  if  n  e  M  —  (A fc  U  Mr) 

,  push(stk,{nj(top(stk )))  if  n  G  Me 

[n]  (stk)  = 

push(pop(pop(stk)),lZn(top(pop(stk)),  [ n](top(stk)))) 

if  n  e  Mr 

where  7Zn  :  C  x  C  — >  C  is  a  merge  function  like  we  have  in  EWPDSs.  It  is  applied  to  the 
dataflow  value  computed  by  the  called  procedure  ([ n](top(stk)))  and  the  value  computed  by 
the  caller  at  the  time  of  the  call  (top(pop(stk))) .  The  definition  assumes  that  a  dataflow  fact 
in  C  contains  all  information  that  is  required  by  a  procedure  so  that  each  transformer  has  to 
look  at  only  the  top  of  the  stack  passed  to  it  -  except  for  return  nodes,  where  the  transformer 
looks  at  the  top  two  elements  of  the  stack.  We  define  a  path  transformer  as  follows:  if  p  — 
[n i  ri2  •  •  •  nk]  is  a  valid  interprocedural  path  in  the  program  then  [p]*  =  [[712  •  •  •  n*,]]*  o  [m]*. 
This  leads  to  the  following  definition. 

Definition  3.3.1.  [52]  If  s  G  M  is  the  starting  node  of  a  program,  then  for  Co  €  C  and 
n  G  M,  the  interprocedural-local- join- over- all-paths  value  is  defined  as  follows: 

LJOPC0(n)  =  U{\pl* (newstack(c0))  j  p  €  VPaths(s,n)} 
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where  VPaths(s,  n )  is  the  set  of  all  valid  interprocedural  paths  from  s  to  n  and  join  of  stacks 
is  just  the  join  of  their  topmost  values:  stki  LI  stk-2  =  top(stki)  U  t,op{stk2). 

We  now  construct  an  EWPDS  We  =  (' P,S,f,g )  to  compute  this  value  when  C  has 
no  infinite  ascending  chains,  all  semantic  transformers  [n]  are  distributive,  and  all  merge 
relations  7 Zn  are  distributive  in  each  of  their  arguments.  Define  a  semiring  S  =  ( D ,  ©,  ©,  0, 1) 
as  D  =  [C  — >  C\  U{0},  which  consists  of  the  set  of  all  distributive  functions  on  C  and  a  special 

function  0.  The  semiring  operations  are  defined  as  follows.  For  a,b  E  D, 

a  if  b  —  0  f  0  if  a  —  0  or  b  —  0 

_  a  ©  b  =  < 

a  ©  b  =  <  b  if  a  =  0  I  (b  o  a)  otherwise 

(a  U  b)  otherwise  1  =  A c.c 

The  pushdown  system  V  is  {{p},M,  A),  where  A  is  constructed  by  including  a  rule  for 
each  edge  in  S.  First,  let  Smtra  Q  £  be  the  intraprocedural  edges  and  Sinter  C  £  be  the 
interprocedural  (call  and  return)  edges.  Then  include  the  following  rules  in  A. 

1.  For  (■ n,m )  E  Sintra,  include  the  rule  r  =  ( p,  n )  {ppm)  with  /(r)  =  [n]. 

2.  For  n  E  Me  and  (n,m)  E  Sinter,  where  Ur  E  Mr  is  the  return  site  for  the  call  at  n, 
include  the  rule  r  =  ( p,n )  ^  {ppm  jir)  with  /(r)  =  [n]  and 

gr(a,  b)  =  A c.7ln(a(c),  ( a  ©  [n]  ©  b  ©  [nfl])(c)). 

3.  For  n  E  M ,  if  it  is  an  exit  node  of  a  procedure,  include  the  rule  r  =  { ppn )  (p,e) 

with  f(r)  =  [n]. 

The  merge  functions  dehned  above  need  not  satisfy  the  path-extension  property  given  in 
Defn.  3.1.2,  but  the  techniques  presented  in  Section  3.2.3  still  allow  us  to  solve  GPS.  This 
leads  us  to  the  following  theorem. 

Theorem  3.3.2.  Let  A  be  a  V-automaton  that  accepts  just  the  configuration  (p,s),  where 
s  is  the  starting  point  of  the  program,  and  let  Apost*  be  the  automaton  obtained  by  using  the 
saturation  rule  shown  in  Fig.  S.j  on  A. 
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(1)  If  S(C(A),c)  is  read  off  Apost*  in  accordance  with  Thru.  3.2.5  and  Co  €  C  and  n  G  J\f,  we 
have: 

LJOP  co(n)  =  [®{*(£(.A),  (q,n  u))\ue  H}](c0). 

(2)  If  L  C  T*  is  a  regular  language  of  stack  configurations  then  LJOP C0(n,L),  which  is  the 
LJOP  value  restricted  to  only  those  paths  that  end  in  configurations  described  by  L,  can  be 
calculated  as  follows: 

LJOPC0(n,  L)  =  [©{J(£(Vl),  (q,n  u))  \  ueL}](c0). 

Result  (1)  in  Thm.  3.3.2  shows  how  an  EWPDS  can  capture  Knoop  and  Steffen’s  result. 
Result  (2)  is  an  extension  of  their  theorem;  it  gives  us  a  way  of  performing  stack-qualified 
queries  in  the  presence  of  local  variables. 

In  case  the  semantic  transformers  [.]  and  7 Zn  are  not  distributive  but  only  monotonic, 
then,  in  the  two  equations  of  Thm.  3.3.2,  the  right-hand  sides  safely  approximate  LJOPC0(n) 
and  LJOP Co(n,L),  respectively. 

3.4  EWPDS  Experiments 

In  [3],  Balakrishnan  and  Reps  present  an  algorithm  to  analyze  memory  accesses  in  x86 
code.  Its  goal  is  to  determine  an  over-approximation  of  the  set  of  values/memory-addresses 
that  each  register  and  memory  location  holds  at  each  program  point.  The  core  dataflow- 
analysis  algorithm  used,  called  value-set  analysis  (VSA),  is  not  relational,  i.e. ,  it  does  not  keep 
track  of  the  relationships  that  hold  among  registers  and  memory  locations.  However,  when 
interpreting  conditional  branches,  specifically  those  that  implement  loops,  it  is  important 
to  know  such  relationships.  Hence,  a  separate  affine-relation  analysis  (ARA)  is  performed 
to  recover  affine  relations  that  hold  among  the  registers  at  conditional  branch  points;  those 
affine  relations  are  then  used  to  interpret  conditional  branches  during  VSA.  ARA  recovers 
affine  relations  involving  registers  only,  because  recovering  affine  relations  involving  memory 
locations  would  require  points-to  information,  which  is  not  available  until  the  end  of  VSA. 
ARA  is  implemented  by  encoding  the  x86  program  as  an  EWPDS  using  the  weight  domain 
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from  [68].  It  is  based  on  machine  arithmetic,  i.e.,  arithmetic  module  232,  and  is  able  to  take 
care  of  overflow.  (The  encoding  is  similar  to  the  one  described  in  Section  3.5.2.) 

Before  each  call  instruction,  a  subset  of  the  registers  is  saved  on  the  stack,  either  by  the 
caller  or  the  callee,  and  restored  at  the  return.  Such  registers  are  called  the  caller-save  and 
callee-save  registers.  Because  ARA  only  keeps  track  of  information  involving  registers,  when 
ARA  is  implemented  using  a  WPDS,  all  affine  relations  involving  caller-save  and  callee-save 
registers  are  lost  at  a  call.  We  used  an  EWPDS  to  preserve  them  across  calls  by  treating 
caller-save  and  callee-save  registers  as  local  variables  at  a  call;  i.e.,  the  values  of  caller-save 
and  callee-save  registers  after  the  call  are  set  to  the  values  before  the  call  and  the  values  of 
other  registers  are  set  to  the  values  at  the  exit  node  of  the  callee. 

The  results  are  shown  in  Tab.  3.1.  The  column  labeled  ‘Branches  with  useful  information’ 
refers  to  the  number  of  branch  points  at  which  ARA  recovered  at  least  one  affine  relation. 
The  last  column  shows  the  number  of  branch  points  at  which  ARA  implemented  via  an 
EWPDS  recovered  more  affine  relations  when  compared  to  ARA  implemented  via  a  WPDS. 
Tab.  3.1  shows  that  the  information  recovered  by  EWPDS  is  better  in  30%  to  63%  of  the 
branch  points  that  had  useful  information.  The  EWPDS  version  is  somewhat  slower,  but 
uses  less  space;  this  is  probably  due  to  the  fact  that  the  dataflow  transformer  from  [68]  for 
‘spoiling’  the  affine  relations  that  involve  a  given  register  uses  twice  the  space  of  a  transformer 
that  preserves  such  relations. 

3.5  Applications  of  EWPDSs 

In  this  section,  we  show  how  different  problems  can  be  encoded  using  EWPDSs.  We  give 
encodings  for  Boolean  programs,  affine-relation  analysis,  and  single-level  pointer  analysis. 
All  of  these  encodings  benefit  from  the  use  of  merge  functions. 

3.5.1  Boolean  Programs 

Let  B  be  a  Boolean  program.  Let  We  =  ( V,S,f,g )  be  an  EWPDS.  We  will  use  We  to 
encode  B.  Without  loss  of  generality,  we  assume  that  each  procedure  of  B  has  the  same 


72 


Memory  (MB) 

Time  (s) 

Branches  with 

useful  information 

Prog 

Insts 

Procs 

Branches 

Calls 

WPDS 

EWPDS 

WPDS 

EWPDS 

WPDS 

EWPDS 

Improvement 

mplayer2 

58452 

608 

4608 

2481 

27 

6 

8 

9 

137 

192 

57 

(42%) 

print 

96096 

955 

8028 

4013 

61 

19 

20 

23 

601 

889 

313 

(52%) 

attrib 

96375 

956 

8076 

4000 

40 

8 

12 

13 

306 

380 

93 

(30%) 

tracert 

101149 

1008 

8501 

4271 

70 

22 

24 

27 

659 

1021 

387 

(59%) 

finger 

101814 

1032 

8505 

4324 

70 

23 

24 

30 

627 

999 

397 

(63%) 

lpr 

131721 

1347 

10641 

5636 

102 

36 

36 

46 

1076 

1692 

655 

(61%) 

rsh 

132355 

1369 

10658 

5743 

104 

36 

37 

45 

1073 

1661 

616 

(57%) 

javac 

135978 

1397 

10899 

5854 

118 

43 

44 

58 

1376 

2001 

666 

(48%) 

ftp 

150264 

1588 

12099 

6833 

121 

42 

43 

61 

1364 

2008 

675 

(49%) 

winhlp32 

179488 

1911 

15296 

7845 

156 

58 

62 

98 

2105 

2990 

918 

(44%) 

regsvr32 

297648 

3416 

23035 

13265 

279 

117 

145 

193 

3418 

5226 

1879 

(55%) 

notepad 

421044 

4922 

32608 

20018 

328 

124 

147 

390 

3882 

5793 

1988 

(51%) 

cmd 

482919 

5595 

37989 

24008 

369 

144 

175 

444 

4656 

6856 

2337 

(50%) 

Table  3.1  Comparison  of  ARA  results  implemented  using  EWPDS  versus  WPDS. 


number  of  local  variables.  Let  G  be  the  set  of  valuations  of  the  global  variables  and  Val  be 
the  set  of  valuations  of  local  variables.  The  actions  of  program  statements  and  conditions 
are  now  binary  relations  on  G  x  Val;  thus,  the  weight  domain  S  is  a  relational  weight  domain 
on  the  finite  set  G  x  Val  (Defn.  2.4.13).  The  PDS  V  and  weight  assignments  /  are  done  in 
a  manner  similar  to  the  encoding  for  dataflow  models:  V  encodes  the  control  flow,  and  the 
weight  of  a  rule  that  is  produced  from  an  edge  e  is  the  binary  relation  of  the  statement  on 
e.  The  weight  on  a  call  rule  forgets  the  values  of  the  local  variables  (under  the  assumption 
that  in  a  Boolean  program,  local  variables  of  a  procedure  are  uninitialized  at  the  start  of  the 
procedure): 


weight  on  call  rule  =  {( g ,  A,  g,  /2)  j  g  G  G,  li,l2  G  Val} 


This  avoid  passing  the  values  of  local  variables  from  a  caller  to  the  callee.  The  weight  on 
a  return  rule  does  the  same  to  avoid  passing  values  of  local  variables  from  the  callee  to  the 
caller: 
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weight  on  return  rule  =  {( g ,  h,g,  k)  \  g  G  G,  k,  h  G  Val} 

What  remains  to  be  defined  are  the  merge  functions. 

Because  different  weights  can  refer  to  local  variables  from  different  procedures,  one  cannot 
take  relational  composition  of  weights  from  different  procedures.  The  project  function  is  used 
to  change  the  scope  of  a  weight.  It  existentially  quantifies  out  the  current  transformation 
on  local  variables  and  replaces  it  with  an  identity  relation.  Formally,  it  can  be  defined  as 
follows: 

project(w)  =  {(gi,h,g2,h)  \  (gi,h,g2,k)  G  w}. 

Once  the  summary  of  a  procedure  is  calculated  as  a  weight  w  involving  local  variables 
of  the  procedure,  the  project  function  is  applied  to  it,  and  the  result  project(w)  is  passed  to 
the  callers  of  that  procedure.  This  makes  sure  that  local  variables  of  one  procedure  do  not 
interfere  with  those  of  another  procedure.  Thus,  merge  functions  for  Boolean  programs  all 
have  the  form 

g(wi,w2)  =  W\  ®  project[w2) ■ 

For  encoding  Boolean  programs  with  other  abstractions,  such  as  finding  the  shortest 
trace,  one  can  use  the  relational  weight  domain  on  ( G  x  Val,  At)  (Defn.  2.4.16),  where  A4 
is  the  minpath  semiring  (Defn.  2.4.14).  The  weights  on  rules  change  as  follows:  if  w  is  the 
weight  of  a  rule  obtained  from  the  Boolean  program  (as  defined  earlier)  then  replace  it  with 
the  following  weight  that  attaches  the  value  1  G  M.  to  it: 

A(<7i,  Zi,  02,  k)-  if  (01,  h,  02,  k)  G  w  then  1  else  00 

The  project  function  on  weights  from  this  domain  can  be  defined  as  follows: 

project(w )  =  A(0i,  h,  g2,  k)-  if  (h  ^  k)  then  05 

else  ®M{w(9i ,Zi,02,Z)  I  l  G  L} 

Again,  the  merge  functions  all  have  the  form  g(wi,w2)  =  W\  ®  project(w2). 
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proc  foo 


proc  bar 


Figure  3.5  An  affine  program  that  starts  execution  at  node  n4.  There  are  two  global 

variables:  and  x2. 


3.5.2  Affine  Relation  Analysis 

An  affine  relation  is  a  linear-equality  constraint  between  integer- valued  variables.  Affine- 
relation  analysis  (ARA)  tries  to  find  all  affine  relationships  that  hold  in  the  program.  An 
example  is  shown  in  Fig.  3.5.  For  example,  for  this  program,  ARA  would  infer  that  x2  =  xi  +  1 
at  program  node  n4. 

ARA  for  single-procedure  programs  was  first  addresses  by  Karr  [45].  ARA  generalizes 
other  analyses,  including  copy-constant  propagation,  linear-constant  propagation  [84],  and 
induction- variable  analysis  [45].  We  have  used  ARA  on  machine  code  to  find  induction- 
variable  relationships  between  machine  registers  (see  Section  3.4). 


Affine  Programs 

Interprocedural  ARA  can  be  performed  precisely  on  affine  programs,  and  has  been  the 
focus  of  several  papers  [67,  68,  36].  Affine  programs  are  similar  to  Boolean  programs,  but 
with  integer- valued  variables.  First,  we  restrict  our  attention  to  affine  programs  with  only 
global  variables  and  show  how  they  can  be  encoded  using  WPDSs,  and  then  show  how  the 
addition  of  locals  variables  is  handled  using  merge  functions. 
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If  {xi,  X2,  •  •  •  ,  xn}  is  the  set  of  global  variables  of  the  program,  then  all  assignments  in  an 
affine  program  have  the  form  Xj  :=  a0  +  Y^i= 1  where  ao,'  •  •  ,  a„  are  integer  constants. 
An  assignment  can  also  be  non-detcrministic,  denoted  by  Xj  :=  ?,  which  may  assign  any 
integer  to  Xj.  (This  is  typically  used  for  abstracting  assignments  that  cannot  be  modeled 
as  an  affine  transformation  of  the  variables.)  All  branch  conditions  in  affine  programs  are 
non-deterministic. 

ARA  Weight  Domain 

We  briefly  describe  the  weight  domain  based  on  the  linear-algebra  formulation  of  ARA 
from  [67].  An  affine  relation  a0  +  Y^i=\  a*x*  =  0  is  represented  using  a  column  vector  of  size 
n  +  l:  a  —  (do,  di,  •  •  •  ,  dn)*.  A  valuation  of  program  variables  x  is  a  map  from  the  set  of 
global  variables  to  the  integers.  The  value  of  x,  under  this  valuation  is  written  as  x(i). 

A  valuation  x  satisfies  an  affine  relation  a  =  (a o,  «i,  •  •  •  ,  a„)(  if  ao  +  Y2i= i  =  0-  An 

affine  relation  a  represents  the  set  of  all  valuations  that  satisfy  it,  written  as  Pts(o).  An 
affine  relation  a  holds  at  a  program  node  if  the  set  of  valuations  reaching  that  node  (in  the 
concrete  collecting  semantics)  is  a  subset  of  PTS(d). 

An  important  observation  about  affine  programs  is  that  if  affine  relations  di  and  a2  hold 
at  a  program  node,  then  so  does  any  linear  combination  of  a\  and  a2-  For  example,  one  can 
verify  that  PTS(ai  +  d2)  7?  PTS(di)  flPTS^),  i.e.,  the  affine  relation  ai  +  d2  (componentwise 
addition)  holds  at  a  program  node  if  both  a\  and  d2  hold  at  that  node.  The  set  of  affine 
relations  that  hold  at  a  program  node  forms  a  (finite-dimensional)  vector  space  [67].  This 
implies  that  a  (possibly  infinite)  set  of  affine  relations  can  be  represented  by  any  of  its  bases; 
each  such  basis  is  always  a  finite  set. 

For  reasoning  about  affine  programs,  Miiller-Olm  and  Seidl  defined  an  abstraction  that 
is  able  to  find  all  affine  relationships  in  an  affine  program:  each  statement  is  abstracted  by 
a  set  of  matrices  of  size  (n  +  1)  x  (n  +  1).  A  statement  Xj  :=  ao  +  J2t=i  a*x*  can  be  written 
as  x  :=  Ax  +  b,  where  x  is  interpreted  as  a  column  vector  of  size  n,  A  is  an  (n  x  n)  matrix, 
and  b  is  a  column  vector  of  size  n: 
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A  = 


Ii-1 

0 

(l\  Cl-2  '  '  '  On 

0 

In-j 

(  n  \ 


b  = 


a-o 

V  0  / 


where  Ik  is  the  identity  matrix  of  size  (. k  x  k),  and  a o  appears  in  the  jth  row  of  b.  Then 
this  statement  is  abstracted  by  a  singleton  set  consisting  of  the  following  matrix  of  size 
(n  +  1)  x  (n  +  1): 


1 

b * 

0 

A* 

We  refer  the  reader  to  [67]  for  the  abstraction  of  other  kinds  of  statements.  The  set 
of  matrices  obtained  in  this  way  form  the  weakest-precondition  transformer  on  affine  rela¬ 
tions  for  a  statement:  if  a  statement  is  abstracted  as  the  set  {mi,m2,---  , mr},  then  the 
affine  relation  a  holds  after  the  execution  of  the  statement  if  and  only  if  the  affine  relations 
(rriia),  (m2a),  •  •  •  ,  ( mra )  hold  before  the  execution  of  the  statement. 

Under  such  an  abstraction  of  program  statements,  one  can  define  the  extend  operation 
as  matrix  multiplication  of  each  member  of  the  first  set  with  each  member  of  the  second  set, 
and  the  combine  operation  as  set  union.  This  is  correct  semantically,  but  it  does  not  give 
an  effective  algorithm  because  the  matrix  sets  can  grow  in  size  without  bound.  However, 
the  observation  that  affine  relations  form  a  vector  space  carries  over  to  a  set  of  matrices  as 
well.  One  can  show  that  the  transformer  {mi,  m2,  •  •  •  ,  rnr }  is  semantically  equivalent  to  the 
transformer  {mi,m2,--  -  where  m  is  any  linear  combination  of  the  rnl  matrices. 

Thus,  a  set  of  matrices  can  be  abstracted  as  the  (infinite)  set  of  matrices  spanned  by  them. 
Once  we  have  a  vector  space,  we  can  represent  it  using  any  of  its  bases  to  get  a  finite  and 
bounded  representation:  a  vector  space  over  matrices  of  size  (n  +  1)  x  (n  +  1)  cannot  have 
more  that  (n  +  l)2  matrices  in  any  basis. 

If  M  is  a  set  of  matrices,  let  Span (M)  be  the  vector  space  spanned  by  them.  Let  f3 
be  the  basis  operation  that  takes  a  set  of  matrices  and  returns  a  basis  of  their  span.  We 
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can  now  define  the  weight  domain.  A  weight  to  is  a  vector  space  of  matrices,  which  is 
represented  using  any  of  its  bases.  Extend  of  vector  spaces  w\  and  w2  is  the  vector  space 
{(mi'/n2)  |  rrii  e  «y}.  Combine  of  w i  and  w2  is  the  vector  space  {(mi  +  m2)  |  m,  e  Wi}, 
which  is  the  smallest  vector  space  containing  both  W\  and  w2.  0  is  the  empty  set,  and  1 
is  the  span  of  the  singleton  set  consisting  of  the  identity  matrix.  The  extend  and  combine 
operations,  as  defined  above,  are  operations  on  infinite  sets.  They  can  be  implemented  by 
the  corresponding  operations  on  any  basis  of  the  weights.  The  following  properties  show  that 
it  is  semantically  correct  to  operate  on  the  elements  in  the  basis  instead  of  all  the  elements 
in  the  vector  space  spanned  by  them: 

/3(wi©w2)  =  f3((3(wi)  ©  P(w2)) 

(5(wi®w2)  =  (3({3(wi)  <S>  P(w2)) 

These  properties  are  satisfied  because  of  the  linearity  of  extend  (matrix  multiplication  dis¬ 
tributes  over  addition)  and  combine  operations. 

Under  such  a  weight  domain,  IJOP(S',  T)  is  a  weight  that  is  the  net  weakest-precondition 
transformer  between  S  and  T.  Suppose  that  this  weight  has  the  basis  {mi,  •  •  •  ,  mr}.  The 
affine  relation  that  indicates  that  any  variable  valuation  might  hold  at  S  is  0  =  (0,  0,  •  •  •  ,  0). 
Thus,  0  holds  at  S,  and  the  affine  relation  a  holds  at  T  iff  m\a  =  m2a  =  •  •  •  =  mra  =  0.  In 
other  words,  a  is  in  the  nullspaces  of  each  of  the  m*.  The  set  of  all  affine  relations  that  hold 
at  T  can  be  found  as  the  intersection  of  the  null  spaces  of  the  matrices  mi,  m2,  ■  ■  •  ,  mr. 

Incorporating  Local  Variables 

If  an  affine  program  has  n  global  variables  and  no  local  variables,  then  the  matrices  have 
size  (n  +  1)  x  (n  +  1).  Assume,  without  loss  of  generality,  that  each  procedure  has  l  local 
variables.  The  statements  are  abstracted  as  a  set  of  matrices  exactly  in  the  same  manner  as 
before,  except  that  the  matrices  have  size  (n  +  l  +  1)  x  (n  +  l  +  1).  These  matrices  can  be 
divided  into  four  quadrants,  as  shown  below. 
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I 

II 

(n  +  1)  x  (n  +  1) 

(n  +  1)  x  l 

III 

IV 

l  x  (n  +  1) 

l  x  l 

The  four  quadrants  of  a  matrix  describe  four  pieces  of  the  transformation  from  pre-state  to 
post-state:  the  first  quadrant  encodes  the  contribution  of  pre-state  values  of  global  variables 
to  post-state  values  of  global  variables;  the  second  quadrant  encodes  the  contribution  of  pre¬ 
state  globals  to  post-state  locals;  the  third  quadrant  encodes  the  contribution  of  pre-state 
locals  to  post-state  globals;  and  the  fourth  quadrant  encodes  the  contribution  of  pre-state 
locals  to  post-state  locals. 

As  for  Boolean  programs,  we  will  define  a  merge  function  using  a  project  function  that 
quantifies  out  the  local  variables  and  replaces  them  with  the  identity  transformation.  This  is 
carried  out  by  zero-ing  out  the  second  and  third  quadrants,  and  changing  the  fourth  quadrant 
to  the  identity  matrix.  If  w  is  a  set  of  matrices,  project(w )  is  defined  as  the  application  of 
the  following  operation  on  all  matrices  of  w : 


i — > 


Here  a  is  the  topmost-leftmost  element  of  rn i .  It  is  used  to  make  the  above  operation 
linear,  which,  in  turn,  makes  merge{w  1,7772)  =  W\  ®  project(w2)  distribute  over  combine.  (A 
justification  of  this  operation  for  quantifying  out  the  local  variables  can  be  found  in  [67].) 

Extensions  to  ARA 

ARA  can  also  be  performed  for  modular  arithmetic  [68]  to  precisely  model  machine 
arithmetic  (which  is  modulo  2  to  the  power  of  the  word  size).  The  weight  domain  is  similar 
to  the  one  described  above. 
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3.5.3  Single-Level  Pointer  Analysis 

In  this  section,  we  define  an  EWPDS  to  find  variable  aliasing  in  programs  written  in  a 
C-likc  imperative  language  that  is  restricted  to  single-level  pointers  (i.e.,  one  cannot  have 
pointers  to  pointers).2  This  problem  was  defined  and  solved  by  Landi  and  Ryder  [61].  We 
first  discuss  some  of  the  results  from  [61],  and  then  move  on  to  describe  an  EWPDS  that 
finds  aliasing  in  a  program.  This  encoding  shows  the  power  of  EWPDSs  for  solving  different 
kinds  of  problems.  Moreover,  it  gives  us  something  new:  a  way  of  answering  stack-qualified 
aliasing  queries. 

We  will  only  to  describe  the  weight  domain  and  merge  functions  for  the  EWPDS,  because 
we  already  know  how  to  model  the  control  flow  of  a  program  as  a  PDS  (Fig.  2.4). 

We  say  that  two  access  expressions  a  and  b  are  aliased  (written  as  (a,  b ))  at  a  particular 
program  point  n  if  in  some  program  execution  they  refer  to  the  same  memory  location 
when  execution  reaches  n.  We  limit  access  expressions  to  variables  and  pointer  dereferences 
(written  as  *p  for  an  address- valued  variable  p).  Given  a  program,  we  want  to  determine 
an  overapproximation  of  all  alias  pairs  that  hold  at  each  program  point.  This  problem  is 
also  referred  to  as  may-aliasing.  In  [61],  this  is  computed  in  two  stages.  First,  conditional 
may-aliasing  information  is  computed,  which  answers  questions  of  the  form:  “if  all  alias  pairs 
in  the  set  A  hold  at  a  program  point  n \ ,  does  the  pair  (a,  b)  hold  at  point  rq?”  The  second 
stage  then  uses  this  information  to  build  up  the  final  may-aliasing  table. 

An  important  property  that  results  from  the  fact  that  we  only  have  single-level  pointers  is 
that  for  all  program  points  n \  and  712,  where  n j  is  the  enter  node  of  the  procedure  containing 
ri2 ,  if  the  alias  pair  (a,  b)  holds  at  rq  under  the  assumption  that  the  set  A  =  {A,  •  •  •  ,  Am } 
of  alias  pairs  holds  at  rq,  then  either  (i)  we  can  prove  that  (a,  b )  holds  at  rq ,  assuming  that 
no  alias  pair  holds  at  np,  or  (ii)  there  exists  a  &,  1  <  k  <  m,  such  that  assuming  that  just  A *. 
holds  at  rii  suffices  to  prove  that  (a,  b)  holds  at  rq .  In  other  words,  we  only  need  to  compute 

2For  languages  in  which  more  than  one  level  of  indirection  is  possible,  the  algorithm  for  single-level 
pointers  still  provides  a  safe  solution  (i.e.,  an  overapproximation)  [61]. 
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conditional  may-alias  information  for  each  alias  pair  Ak  G  A,  rather  than  for  each  subset  of 

A 

We  say  that  the  alias  pair  (a, .)  holds  at  program  point  n  if  a  is  aliased  to  some  access 
expression  that  is  not  visible  (i.e. ,  out  of  scope)  in  the  procedure  containing  n.  It  is  not 
necessary  to  know  the  particular  invisible  access  expression  to  which  a  is  aliased  because  a 
procedure  will  always  have  the  same  effect  on  all  alias  pairs  that  contain  access  expression 
a  and  any  invisible  access  expression  [61]. 

For  a  given  program,  let  V  denote  the  set  of  all  its  variables  and  pointer  dereferences. 
Assume  that  all  variables  have  different  names  (local  variables  can  be  prefixed  by  the  name 
of  the  procedure  that  contains  them)  so  that  there  are  no  name  conflicts.  The  set  AV  = 
(V  x  V)  U  (V  x  {.})  U  ({.}  x  V )  is  the  set  of  all  possible  alias  pairs.  Let  AV±  =  AV  U  {A}, 
where  _L  represents  the  absence  of  an  alias  pair. 

We  now  construct  a  weight  domain  over  the  set  D  =  ( AP±  — >  2AV)  of  all  functions 
w  from  AV±  to  the  power  set  of  AV  with  the  following  monotonicity  restriction:  for  all 
x  G  AV,  iu(-L)  C  w{x).  Operations  on  weights  will  maintain  the  invariant  that  alias  relations 
are  symmetric  (i.e.,  if  (a,  b)  holds,  so  does  ( b,a )).  Each  weight  w  G  D  can  be  efficiently 
represented  as  a  one-to-many  map  from  AV±  to  AV. 

An  interprocedural  path  P  with  weight  w  means  that  if  we  assume  (a,  b)  to  hold  at  the 
beginning  of  P  then  all  pairs  in  w((a,b))  hold  at  the  end  of  path  P  when  the  program 
execution  follows  P.  The  special  element  _L  handles  the  case  when  no  pair  is  assumed  to 
hold  at  the  beginning  of  the  path;  w(_L)  is  the  set  of  all  alias  pairs  that  hold  at  the  end 
of  the  path  without  assuming  that  any  pair  holds  at  the  beginning  of  the  path.  Thus,  a 
weight  represents  conditional  may-aliasing  information,  which  motivates  the  monotonicity 
condition  introduced  above. 
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For  all  w\  ^  0  ^  w2,  the  semiring  operations  are  defined  as  follows.  For  x  G  AV _l, 

(wi(Bw2)(x)  =  Wi(x)  U  w2(x) 

(w  1  (8)  w2)(x)  =  w2(- L)  U  (U yew1(x)W2(y)) 

_  (  0  if  x  —  _L 

l(x)  = 

|  {x}  otherwise 

If  path  Pi  has  weight  W\  and  path  P2  has  weight  w2,  then  the  weight  w L  8)  w2  summarizes 
the  conditional  alias  information  of  the  path  P\  followed  by  P2.  In  particular,  (wi  ©  w2)(x) 
consists  of  the  alias  pairs  that  hold  from  w2l  regardless  of  the  value  of  w\,  together  with 
the  alias  pairs  that  hold  from  w2  given  W\(x).  When  Pi  and  P2  have  the  same  starting  and 
ending  points,  the  weight  w\  ©  w2  stores  conditional  aliasing  information  when  the  program 
execution  follows  P\  or  P2. 

(The  semiring  constant  0  cannot  be  naturally  described  in  terms  of  conditional  aliasing, 
but  we  can  add  it  to  D  as  a  special  value  that  satisfies  all  properties  of  Defn.  2.4.1.) 

We  now  consider  how  to  associate  a  weight  to  each  pushdown  rule  in  the  EWPDS  that 
encodes  the  program.  For  a  node  n  that  contains  a  statement  of  the  form  x  =  y,  where  x 
and  y  are  pointers,  the  weight  associated  with  each  rule  of  the  form  ( p ,  n)  •  •  •  is  a  map, 
where  for  each  x  G  AV _l,  the  first  applicable  mapping  that  appears  in  the  list  below  is  used: 

(*V,b)  ^  {(*2b&)} 

(■ a,*y )  ^  {(©*2;)} 

(*x,  b)  1— >  0 
(a,  *x)  1 — *  0 

(a,b)  ^  {(a,  6)} 

_!_  {(a,  a)  |  a  G  V}  U  {(*£,  *y),  (*y, 

Roughly  speaking,  this  generates  the  alias  pairs  ( *x ,  *y)  and  ( *y ,  *x),  makes  the  aliases  of  *y 
into  aliases  of  *x,  and  removes  the  previously  existing  alias  pairs  of  *x  (except  for  ( *x ,  =t=a;)). 
To  enforce  monotonicity  on  weights,  the  following  closure  operation  is  applied  to  the  map: 
cl[w )  =  A x.[w[x)  U  tc(T)).  The  weights  on  other  rules  that  represent  intraprocedural  edges 
can  be  defined  similarly  (see  [61]). 
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For  a  push  rule,  the  weight  is  determined  according  to  the  binding  that  occurs  at  the  call 
site;  the  definition  is  presented  in  Fig.  3.6.  All  pop  rules  have  the  weight  1. 

The  merge  functions  associated  with  push  rules  reflect  the  way  conditional  aliasing  in¬ 
formation  is  computed  for  return  nodes  in  [61].  Consider  the  push  rule  ( p ,  callf00 )  ^ 
(p,  enterbar  returrif00),  which  is  a  call  to  procedure  bar  from  foo ,  and  suppose  that  bindcau 
is  the  weight  associated  with  this  rule.  For  local  access  expressions  Zi,Z2  of  foo  and  global 
access  expressions  g\ ,  r/2,  the  following  must  hold. 

•  The  alias  pair  (Zi,  Z2)  holds  at  returrif00  only  if  the  pair  (Zi,Z2)  holds  at  the  call  node 
callf00. 

•  The  alias  pair  (fp,  g2)  holds  at  returrif00  only  if  the  pair  holds  at  exitbar- 

•  The  alias  pair  (gi,h)  holds  at  returrifO0  only  if  (g±, .)  holds  at  exitbar  and  the  invisible 
variable  is  This  happens  when  a  pair  (oi,li)  that  held  at  callf00  caused  (o2,.)  to 
hold  at  enterbar  because  of  the  call  bindings  ((o2, .)  G  bindcau((oi ,  li)))  and  this  pair,  in 
turn,  caused  (gi,  ■)  to  hold  at  exitbar- 

To  encode  these  facts  as  weights  for  an  algorithmic  description  of  the  merge  functions, 
we  need  to  define  certain  weights  and  operations  on  them. 

•  Projection.  For  a  set  S'  C  (FU  {.}),  let  ws  be  a  weight  that  only  preserves  alias  pairs 
in  S'  x  S':  wg(_L)  =  0  and 


ws((a,b)) 


{(a,  b )}  if  a,b  G  S' 
0  otherwise 


•  Restoration.  For  an  access  expression  v  G  V.  let  wvs  be  a  weight  that  changes  alias 
pairs  when  v  comes  back  in  scope  conditional  on  the  set  S'  C  (V  U  {.}):  iUg(_L)  =  0 
and 
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bindn(±.) 


^  {<*/*,  *fj)  I  [fi,  ai]->  [ fj ,  aj\,cn  =  cij}  ^ 

U  {( }  |  [fi,ai],visiblep(ai)} 

U  {( )  |  [fi,ai],visiblep(ai)} 
u  {(*/u-)  I  [/i,ai],-,vmWep(ai)} 

\  U  {<-,  =•=/*)  |  [fi,ai},-ivisiblep(cii)} 


bindn((a,  b)) 


bindn(-L) 

U  {(a,  b)  |  visiblep(a),visiblep(b)} 

U  {(a,-)  \  visiblep(a),~‘visiblep(b)} 

U  {(.,6)  |  -■  visiblep(a),visiblep(b)} 

U  {(a,*fi)  |  visiblep(a),[fi,ai],*ai  =  b} 
u  {(•,*/*)  I  -'visiblep(a),[fi,ai\,*ai  =  b} 

U  {( *fi,b )  |  visiblep(b),[fi,ai\,*ai  =  a} 
u  I  -'visiblep(b),[fi,ai],*ai  =  a} 

\  U  {(*fi,*fj)  I  [fi,ai],[fj:aj]>*ai  =  a’*ai  =  b} 


Figure  3.6  A  function  that  models  parameter  binding  for  a  call  at  program  point  n  to  a 
procedure  named  p.  For  brevity,  we  write  [/,  a]  to  denote  the  fact  that  /  is  a  pointer- valued 
formal  parameter  bound  to  actual  a.  Also,  visiblep(a )  is  true  if  a  is  visible  in  procedure  p. 


{(a,u)} 


ws((a,b))  =  S  {(v,b)} 


if  b  =  .  and  a  E  S 
if  a  —  .  and  b  e  S 
otherwise 


•  Conditional  Extend.  For  an  alias  pair  ( a,b ),  define  ®(a,b)  to  be  a  binary  operation 
on  weights  that  calculates  the  alias  pairs  that  hold  at  the  end  of  a  path  as  a  result  of 
the  fact  that  (a,  b)  held  at  a  point  inside  the  path.  For  x  e  AV _l, 


Ol  ®(a,h)  w2){x) 


w2{{a,b))  if  (a,  b)  G  Wi(x) 
w2(-L)  otherwise 
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We  can  now  define  the  merge  functions.  If  G  is  the  set  of  global  access  expressions  of 
the  program,  then  for  a  call  from  a  procedure  with  local  access  expressions  L  and  binding 
weight  bindcaii  (i.e.,  the  weight  on  the  push  rule),  the  merge  function  is  defined  as  follows 
(where  Le  denotes  L  U  {.}): 

g(w i,w2)  =  if(wi  =  0  or  w2  =  0)  then  0 

(wi  (8)  wLe ) 

©  (w i  8)  bindcaii  8  w2  8  wG) 

©  ©  (K  8(0,i>  ( bindcaii  ©  w2))  8  wlG) 

( a,l)£VxLe 

©  ©  (K  ©<;,«)  ( bindcaii  ©  w2))  8  wlG) 

(l )Qj)£.Le  X  V 

The  first  term  in  the  combine  copies  over  from  the  call  site  the  pairs  for  local  access  expres¬ 
sions.  The  second  term  copies  over  from  the  called  procedure’s  exit  site  the  pairs  for  global 
access  expressions.  The  third  and  fourth  terms,  which  are  combines  over  all  pairs  in  V  x  Le 
and  Le  x  V,  respectively,  account  for  global-local  access  expressions,  following  the  strategy 
discussed  earlier  in  this  section. 

After  the  EWPDS  is  constructed,  we  can  run  a  single  GPS  query  on  the  configuration 
set  C  —  {(p,  entermain )}  (where  p  is  the  single  control  location  of  the  EWPDS),  and  obtain 
the  may-alias  pairs  as  follows, 

may-alias(n )  =  IJOP(C,  {(p,n  u)  \  u  e  r*})(_L). 

In  addition  to  computing  the  Landi-Ryder  may-alias  pairs,  we  can  also  answer  stack- 
qualified  queries  about  may-alias  relationships.  For  instance,  we  can  find  out  the  may- 
alias  pairs  that  hold  at  rq  when  execution  ends  in  the  stack  configuration  (p,  n\n2  ■  ■  ■  n^). 
Such  queries  allow  us  to  obtain  more  precise  information  than  what  is  obtained  by  merely 
computing  a  may-aliasing  query  for  paths  that  end  at  ri\  with  any  stack  configuration. 

3.6  Related  Work 

Weighted  pushdown  systems  have  been  used  for  finding  uninitialized  variables,  live  vari¬ 
ables,  linear  constant  propagation,  and  the  detection  of  affine  relationships.  In  each  of  these 
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cases,  local  variables  are  handled  by  introducing  special  paths  in  the  transition  system  of  the 
PDS  that  models  the  program.  These  paths  skip  call  sites  to  avoid  passing  local  variables 
to  the  callee.  This  leads  to  imprecision  by  breaking  existing  relationships  between  local 
and  global  variables.  Besides  dataflow  analysis,  WPDSs  have  also  been  used  for  generalized 
authorization  problems  [87]. 

A  library  for  WPDSs  is  available  as  part  of  Moped  [50].  We  have  developed  our  own 
implementations  of  WPDSs:  WPDS++  [49]  and  WALi  [47],  both  of  which  now  support 
EWPDSs  as  well. 

MOPED  [32,  50]  has  been  used  for  performing  relational  dataflow  analysis,  but  only  for 
finite  abstract  domains.  Its  basic  approach  is  to  embed  the  abstract  transformer  of  each 
program  statement  into  the  rules  of  the  pushdown  system  that  models  the  program.  This 
contrasts  with  WPDSs,  where  the  abstract  transformer  is  a  separate  weight  associated  with  a 
pushdown  rule.  Moped  associates  global  variables  with  states  of  the  PDS  and  local  variables 
with  its  stack  symbols.  Then  the  stack  of  the  PDS  simulates  the  run-time  stack  of  the 
program  and  maintains  a  different  copy  of  the  local  variables  for  each  procedure  invocation. 
A  simple  pushdown  reachability  query  can  be  used  to  compute  the  required  dataflow  facts. 
The  disadvantage  of  that  approach  is  that  it  cannot  handle  infinite-size  abstract  domains 
because  then  associating  an  abstract  transformer  with  a  pushdown  rule  would  create  infinite 
pushdown  rules.  In  contrast,  an  EWPDS  is  capable  of  performing  an  analysis  on  infinite-size 
abstract  domains.  The  domain  used  for  copy-constant  propagation  in  Section  3.1  is  one  such 
example. 

Besides  dataflow  analysis,  model-checking  of  pushdown  systems  has  also  been  used  for 
verifying  security  properties  in  programs  [31,  42,  21],  Like  WPDSs,  we  can  use  EWPDS  for 
this  purpose,  but  with  added  precision  that  comes  due  to  the  presence  of  merge  functions. 

The  idea  behind  the  transition  from  a  WPDS  to  an  EWPDS  is  that  we  attach  extra 
meaning  to  each  run  of  the  pushdown  system.  We  look  at  a  run  as  a  tree  of  matching  calls 
and  returns  that  push  and  pop  values  on  the  run-time  stack  of  the  program.  This  treatment 
of  a  program  run  has  also  been  explored  by  Miiller-Olm  and  Seidl  [67]  in  an  interprocedural 
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dataflow-analysis  algorithm  to  identify  the  set  of  all  affine  relationships  that  hold  among 
program  variables  at  each  program  node.  They  explicitly  match  calls  and  returns  to  avoid 
passing  relations  involving  local  variables  to  different  procedures.  This  allowed  us  to  to 
directly  translate  their  work  into  an  EWPDS,  which  we  have  used  for  the  experiments  in 
Section  3.4. 

3.7  Proofs 

In  this  section,  we  give  proofs  for  Thms.  3.2.1  and  3.2.5.  In  each  case,  we  give  an  abstract 
grammar  problem  G,  similar  to  the  ones  shown  in  Section  2.4.1  for  WPDSs,  and  then  show 
the  following:  ( i )  computing  JOD  values  for  G  is  sufficient  for  computing  IJOP  on  EWPDSs; 
(ii)  the  saturation-based  algorithms  compute  JOD  values  for  G. 

Fix  We  =  (' P,S,f,m )  to  be  an  EWPDS,  where  V  =  (P,T,  A)  is  the  underlying  PDS. 
Fix  A  =  (Q,  T,  — >o?  P,  P)  to  be  a  P- automaton.  Let  mr  be  the  merge  function  associated 
with  rule  r. 

Proof  of  Thm.  3.2.1 

The  PopRuleSeq  grammar  shown  in  Fig.  2.6  characterizes  the  set  of  all  paths  in  a  PDS. 
For  WPDSs,  we  saw  how  replacing  the  rules  in  the  PopRuleSeq  grammar  with  weights  led 
to  an  abstract  grammar  problem  (shown  in  Fig.  2.13)  that  solves  GPP  for  WPDSs.  We 
follow  a  similar  strategy  for  EWPDSs,  but  we  need  to  consider  how  a  rule  sequence  is  parsed 
by  the  PopRuleSeq  grammar,  identify  balanced  rule  sequences,  and  insert  merge  functions 
accordingly. 

We  say  that  a  non-terminal  over-approximates  a  non-terminal  N2  (possibly  from  a 
different  context-free  grammar)  when  £(Ni)  D  C(N2).  A  grammar  G\  over-approximates 
a  grammar  G2  if  for  every  non-terminal  of  G 2l  there  is  a  non-terminal  of  G\  that  over- 
approximates  it. 

Consider  the  grammar  shown  in  Fig.  3.7.  We  call  it  the  accepting  rule-sequence  grammar 
for  GPP.  The  productions  from  case  1  to  4  are  from  the  PopRuleSeq  grammar  for  the 
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Production 

for  each 

(1) 

PopRuleSeq(qnql) 

—>  £ 

(9,7,9')  e-+o 

(2) 

P  op  RuleS  eg(Pi7y) 

— ►  r 

r  =  (p,  7}  (p\  e)  e  A 

(3) 

P  op  RuleS eg(Pj7i9) 

—>  r  PopRuleSeq^  y  ^ 

r  =  (p,  7)  ^  (p',  7')  €  A,q  G  P 

(4) 

Pop  RuleS eq(priq) 

r  Pop  RuleS  eq^p,  y  q,^ 

P  op  RuleS  eq^y ,q) 

r  =  (p,  7)  ^  O',  il")  e  A,  q,  q'  e  P 

(5) 

Accepting  RuleS eq{p 7172  •  • 

■■In]  PopRuleSeq^^PopRuleSeq (gi)72i,2)  •  •  •  PopRuleSeq{gnl^qn) 

p  e  P,  7i  e  T,  qi  e  Q  for  1  <  i  <  n,  qn  £  F 

Figure  3.7  The  Accepting RuleSeq  grammar  for  GPP,  given  PDS  V  and  automaton  A. 


PDS  VA  (Defn.  2.3.7),  but  the  set  of  terminals  is  restricted  to  be  only  A,  i.e.,  we  replace 
the  rules  that  are  produced  from  A  with  £  (in  case  1  of  the  grammar).  The  productions 
from  case  5  add  a  new  set  of  non-terminals.  From  Cor.  2.3.6  and  Lem.  2.3.8,  one  can 
show  that  C{AcceptingRuleSeq\p^i^2  •  •  •  7 n])  equals  the  set  of  all  rule  sequences  that  take  the 
configuration  (p,  7i  •  •  •  7n)  to  a  configuration  in  A,  i.e.,  it  equals  {pathsv((p,  71  •  •  •  qn),  c)  | 
c  e  C{A)}. 

The  grammar  shown  in  Fig.  3.8  (call  it  Gover)  over-approximates  the  accepting  rule- 
sequence  grammar  when  we  remove  the  Accepting  RuleSeq  non-terminals.  This  is  easy  to 
prove.  The  grammar  Gover  is  obtained  as  follows:  for  each  production  in  Fig.  3.7,  except  for 
the  ones  in  case  5,  replace  the  non-terminal  PopRuleSeqt  with  A,  if  t  E  (P  x  T  x  P),  or  with 
B  if  t  6  P  x  T  x  (Q  —  P),  or  with  C  if  t  e  (Q  —  P)  x  V  x  (Q  —  P);  a  PDS  rule  r  is  replaced 
with  Ri  if  it  has  i  stack  symbols  in  its  right-hand  side. 

One  can  show  that  C(A)  C  (£(<Jb)  Ro)  and  £(B)  C  (R2  U  £(07))*,  where  <77 >  is  the  non¬ 
terminal  from  Fig.  3.2  that  derives  balanced  sequences.  Thus,  £(PopRuleSeqt )  C  (£(oy)  Ro) 
for  t  e  P  x  T  x  P,  and  £(PopRuleSeqt )  C  ( R2  U  C(<Tb))*  for  tePxTx(Q  —  P ).  One 
can  also  show  that  C(AcceptingRuleSeq[c ])  C  (£(A)*£(5)+).  Moreover,  ( R-2A )  can  only 
derive  balanced  sequences.  The  two  instances  of  R2A  in  the  grammar  Gover  are  where  merge 
functions  have  to  be  slipped  in — and  both  such  instances  come  from  case  4  productions  of 
the  accepting  rule-sequence  grammar. 


Non-terminal 

Over-approximates 

PopRuleSeqt  for  t  in 

A 

p  x  r  x  p 

B 

PxTx(Q-P) 

C 

(Q-P)xTx(Q-P) 

A  ->  R0\  Rx  A\R2  A  A 
B  — >  £  |  i?!  £>  j  R-2  B  C  |  i?2  A  B 
C  ^  £ 


Figure  3.8  A  grammar  that  over- approximates  the  grammar  shown  in  Fig.  3.7. 

Based  on  the  above  observations  about  the  rule  sequences  that  can  be  derived  from 
the  non-terminals  of  the  accepting  rule-sequence  grammar,  we  construct  the  abstract  gram¬ 
mar  shown  in  Fig.  3.9.  It  is  similar  to  the  abstract  grammar  for  solving  GPP  on  WPDSs 
(Fig.  2.13).  Case  4  in  Fig.  3.9  corresponds  to  the  productions  A  — >  i?2  A  A  and  B  — >  R-2  A  B 
of  Gover,  thus,  uses  a  merge  function.  Case  5  corresponds  to  the  production  B  R2  B  C , 
and  does  not  use  a  merge  function  because  the  call  rule  i?2  cannot  have  any  matching  return 
rule  in  the  rule  sequence  derived  from  ( B  C ). 

By  our  construction,  JOD (PopSeq^^)  =  IJOPwe(pot/is((p,  7),  (q,  e))).  Moreover, 
3  OB  (Accepting  Seq\p^2  •  ■  ■  7n])  =  ^we(Paths((p,  7i  ■  ■  •  ln),£(A))) 

The  saturation  procedure  shown  in  Fig.  3.3  solves  the  abstract  grammar  of  Fig.  3.9  for 
PopSeq  non-terminals.  Thus,  when  the  saturation  procedure  finishes,  the  weight  on  transi¬ 
tion  t  is  wt  if  and  only  if  wt  =  J OD (PopSeqt).  (See  also  Lem.  2.4.9.)  Moreover,  if  Apre* 
is  the  resulting  automaton,  then  Apre*((p,  71  •  •  ■nfn))  —  30B(AcceptingSeq\p^i^2  •  •  -7n])  be¬ 
cause  the  productions  for  AcceptingSeq  are  only  computing  the  weight  of  accepting  paths  in 
APre*  ■  This  proves  that  Apre*  (c)  =  hyve  (c,C(A)). 

Proof  of  Thm.  3.2.5 

The  proof  of  Thm.  3.2.5  uses  an  argument  similar  to  the  one  used  in  the  proof  of 
Thm.  3.2.1.  To  simplify  the  proof,  we  assume  that  the  weight  on  a  call  rule  is  always 
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Figure  3.9  An  abstract  grammar  problem  for  solving  GPP  on  EWPDSs. 


1.  An  EWPDS  Wg  that  does  not  satisfy  this  restriction  can  always  be  converted  to  one,  say 
Wg ,  that  does  satisfy  it  as  follows: 

1.  The  semiring  of  Wg  is  over  pairs  of  weights  with  the  operations  defined  componen¬ 
twise,  i.e.,  the  semiring  is  ((£),  D),  ©p,  ®p,  (0,  0),  (1, 1)),  where  both  (g)p  and  ©p  are 
componentwise  <g>  and  ©,  respectively. 

2.  For  every  call  rule  r  =  (p,  7)  (pr,  7'  7")  in  Wg  with  weight  f(r)  and  merge  func¬ 
tion  mr,  add  the  rules  r\  =  (p,  7)  (p,  yr)  and  r2  =  (p,  jr)  l")  to  We1 2 

with  weights  (l,/(r))  and  (1,1),  respectively.  The  rule  r2  has  the  merge  function 
A(au,  x2).X(yu  y2).(mr(x1,y1),l). 
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Production 

for  each 

(1) 

PushRuleSeq^qnq,) 

->•  £ 

{q,i,q') 

’0 

(2) 

SameLevelRuleSeq,pr£q\ 

->•  PushRuleSeq ^  ^  r 

r  =  (p,  7)  -- 

>  (p',  e)  e  A,  q  e  Qe 

(3) 

PushRuleSeq^  y q,^ 

->•  PushRuleSeq^ 

SameLevelRuleSeq^pi  e  q. 

P1  G  P,  f/ 

G  Qe 

(4) 

PushRuleSeq^y^ 

->•  PushRuleSeq^  r 

r  =  (p,  7)  -- 

>  (p',  7')  G  A,  q  G  Qe 

(5) 

PushRuleSeq^y  pi  ^ 

->  £ 

r  =  (p,  7)  -- 

>  (A,  7'  7")  G  A 

(6) 

PushRuleSeq^,,  ny,q) 

->•  PushRuleSeq^iq)  r 

r  =  (p,  7)  -- 

>  (p',  Y  7")  e  A,  q  e  Qe 

(7) 

Accepting RuleSeqlp^i^  •  •  ■  7„]  - 

->•  PushRuleSeq^  qi) 

•  •  •  PushRuleSeq{qn_lM) 

p  G  P,  7*  G  T,  qi  G  Qe,  for  1  <  i  < 

.n,qneF 

Figure  3.10  The  AcceptingRuleSeq  grammar  for  GPS,  given  PDS  V  and  automaton  A. 

The  set  Qe  is  defined  as  ( Q  U  Qmid) 


3.  For  every  other  rule  r  of  W* ,  add  r  to  Wq  with  weight  (/(r),  1). 

The  pairing  of  weights  that  occurs  in  poststar  is  because  of  the  above  construction  (but 
the  transitions  on  the  new  stack  symbols  7r  remain  implicit). 

The  accepting  rule-sequence  grammar  for  GPS  is  shown  in  Fig.  3.10.  The  grammar 
that  overapproximates  it  is  shown  in  Fig.  3.11,  and  its  language  is  shown  in  Fig.  3.12.  The 
occurrences  of  ( D  E)  and  ( C  E)  is  where  merge  functions  have  to  be  applied.  The  abstract 
grammar  that  solves  GPS  is  shown  in  Fig.  3.13.  Again,  this  abstract  grammar  justifies  the 
saturation  procedure  of  Fig.  3.4:  the  latter  is  simply  computing  JOD  values  for  the  former. 
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Non-terminal 

Over-approximates 

PushRuleSeqt  for  t  in 

A 

P  x  T  x  (Q  -  P) 

B 

P  X  r  X  Qmid 

C 

Q  mid  Xrx(Q-P) 

D 

Qmid  X  r  X  Qmid 

E 

P  X  -{e}  X  Qmid 

F 

P  x  {e}  x  (Q  -  P ) 

G 

(Q  -  P)  x  T  x  (Q  -  P) 

A  - 

e  |  A  Ri  \ 

\C  E\F 

B  - 

->  e  \  B  R\ 

|  D  E 

C  - 

A  i?2 

D  - 

->  B  FL-2 

E  - 

■+  G  B  R0 

F  - 

->  GAR0 

G 


Figure  3.11  A  grammar  that  over- approximates  the  grammar  shown  in  Fig.  3.10. 


Non-terminal 

Language  is  over-approximated  by 

A 

{Rq  &b)* 

B 

C 

( Ro  ^b)*  R-2 

D 

0~b  R'2 

E 

crb  Ro 

F 

{Ro  °~b)*  Ro 

Figure  3.12  The  language  of  strings  derivable  from  the  non-terminals  of  the  grammar 
shown  in  Fig.  3.11.  Here  ay  is  non-terminal  of  Fig.  3.2  that  derives  balanced  sequences. 


Production 


for  each 

(1)  PushSeq{qiql)  -►  5i(e)  (9,7,?') 

<]\  =  T 

(2)  SameLevelSeq{p,£q)  -»■  g2(PushSeq(p^q) )  r  =  (p,7)  (p',e)  E  A,  q  E  Qe 

g2  =  Xx.x  0  /(r) 

(за)  PushSeq^  a„q,)  — >•  g^{PushSeq(qri„  q^,  SameLevelSeq^,  e  q)) 

p'  E  P,  q'  G  Qe,g  =  py  G  Qmid,?’  =  lookupPushRule(p,  7',  7") 

53  =  Xx.Xy.mr(y,x) 

(зб)  PushSeq^, a„  qt)  -»■  g'3{PushSeq{qi„  ql),  SameLevelSeq^  ^) 

P  G  P,  5  G  5  0  Gniid 

5g  =  Xx.Xy.y  0  x 

(4)  PushSeq^,  a,q)  ->  g^PushSeq^^)  r  =  (p,  7)  ^  (p',7')  G  A,g  G  Qe 

54  =  Ax.x  0  /(r) 

(5)  PushSeq{ yi7V  }  ->■  55(e)  r  =  (p,7)  ^  (p',7'  7")  G  A 

95  =  T 

(6)  PushSeq ^ ->■  g&iPushSeq^^)  r  =  (p,  7)  -4  (p',7'  7")  E  A,  q  E  Qe 

5e  =  Ax.x 

(7)  AcceptingSeq\p  7172  •  •  •  7„]  ->■  g7{PushSeq{priuqi),  ■■■  ,  PushSeq (9n„li7n>9n)) 

p  E  P,  7i  G  T,  qt  E  Qe,  for  1  <  i  <  n,  qn  E  F 
57  =  A27  •  •  •  Xxn.xn  0  •  •  •  0  £1 


Figure  3.13  An  abstract  grammar  problem  for  solving  GPS  in  an  EWPDS.  mr  is  the 

merge  function  associated  with  rule  r. 
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Chapter  4 

Faster  Interprocedural  Analysis  Using  WPDSs 

The  previous  chapter  described  various  abstract  models  and  their  analyses  for  finding 
the  set  of  reachable  states.  At  the  heart  of  all  these  analyses,  and  some  others  [81,  84,  88,  9, 
33,  85],  is  a  chaotic-iteration  strategy.  These  analyses  are  saturation  based:  they  use  some 
set  of  rules  in  order  to  saturate  the  currently  inferred  set  of  reachable  states.  The  analyses 
simply  state  that  the  rules  can  be  applied  in  any  order  (e.g.,  see  Section  2.3.2).  Thus,  their 
implementations  are  free  to  choose  the  rules  in  any  order.  The  strategy  of  choosing  any  rule 
at  random  is  called  the  chaotic-iteration  strategy. 

For  instance,  many  standard  algorithms  for  dataflow  analysis  are 
worklist-based.  They  start  with  an  initial  value  at  the  entry  node  and,  at 
each  step,  propagate  changes  to  a  successor  node  by  picking  an  outgoing 
edge  at  random.  Consider  running  this  strategy  on  the  dataflow  model 
shown  in  Fig.  4.1.  Suppose  each  edge  in  the  graph  is  labeled  with  a 
dataflow  transformer,  and  we  want  to  solve  for  the  JOP  value  for  all 
paths  from  node  V\  to  node  v§.  Also  suppose  that  the  loop  (iq  v2  u3) 
requires  10  iterations  around  the  loop  to  reach  a  fixpoint. 

If  the  algorithm  starts  at  node  iq  and  propagates  changes  to  v2, 
then  V3  and  so  on  to  Vq  before  taking  the  backedge  from  u3  to  iq,  the 
algorithm  would  end  up  performing  6  x  10  =  60  operations  (assuming 
one  operation  for  propagating  changes  across  a  single  edge).  The  ideal 
way  of  computing  the  JOP  value  is  to  first  saturate  the  loop  and  then 


Figure  4.1  A 
simple  dataflow 
model  that  has  a 
graph  with  a  loop. 


94 


go  outside  it,  i.e.,  the  changes  are  propagated  within  the  loop  until  a  fixpoint  is  reached, 
and  then  propagated  to  nodes  rq ,  v§  and  vq  just  once.  This  requires  only  3  x  10  +  3  =  33 
operations. 

The  general  observation  is  that  the  iteration  order  matters  for  the  total  running  time 
of  a  saturation-based  analysis.  Tarjan  gave  an  efficient  iteration  order  for  finite  graphs 
[91,  90]  that  applies  to  single-procedure  dataflow  models.  We  extend  that  algorithm  to  the 
interprocedural  setting.  To  provide  a  common  setting  to  discuss  most  of  the  above-mentioned 
analyses,  we  use  WPDSs  to  describe  our  improvement  to  the  chaotic-iteration  strategy.  Our 
techniques  also  apply  to  EWPDSs.  Besides  speeding  up  reachability,  our  techniques  also 
help  in  witness  generation,  differential  propagation,  and  incremental  computation  (Section 
4.2). 

Tarjan’s  algorithm  [91,  90]  works  by  efficiently  converting  a  graph  into  a  regular  ex¬ 
pression.  Evaluating  the  regular  expression  (under  an  appropriate  interpretation  in  which 
expression  concatenation  is  interpreted  as  (8)  and  expression  union  is  interpreted  as  ©)  is 
sufficient  to  solve  for  the  desired  JOP  weight.  Our  technique  generalizes  this  algorithm  to 
programs  with  multiple  procedures  as  follows:  for  every  procedure  p,  we  introduce  a  variable 
Xp,  which  represents  the  summary  of  the  procedure,  i.e.,  the  net  effect  of  all  (valid)  paths 
that  go  from  the  entry  of  the  procedure  to  its  exit.  Next,  we  replace  procedure  calls  with 
their  summary,  i.e.,  a  call  to  a  procedure  p  in  the  1CFG  is  replaced  with  an  intraprocedural 
edge  labeled  with  Xp.  This  results  in  a  collection  of  graphs,  one  for  each  procedure.  Next,  we 
use  Tarjan’s  algorithm  to  obtain  a  set  of  equations:  the  graph  for  procedure  p  is  converted 
to  an  equation  Xp  =  rp,  where  rp  is  the  regular  expression  for  the  graph.  The  expressions 
may  depend  on  unknown  variables.  For  example,  if  procedure  p  calls  procedures  q  and  s, 
then  rp  may  contain  variables  Xq  and  Xs. 

The  resulting  equations  can  be  solved  using  a  chaotic-iteration  strategy:  for  each  pro¬ 
cedure  j,  initialize  the  variable  Xj  to  _L  (or  the  semiring  weight  0);  next,  pick  an  equation 
Xt  =  n ,  evaluate  the  expression  rt  and  update  the  value  of  Xp,  and  repeat  until  the  values 
of  all  variables  stop  changing.  This  strategy  would,  however,  give  up  most  of  the  benefit  of 
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using  the  regular  expressions.  We  give  an  order  in  which  the  equations  should  be  solved, 
and  also  show  how  to  speed  up  multiple  evaluations  of  the  same  expression.  This  results  in 
an  efficient  interprocedural-analysis  algorithm. 

We  also  show  how  to  reduce  a  WPDS  reachability  problem  (Defn.  2.4.4)  to  a  problem 
on  graphs  that  can  be  solved  in  a  similar  fashion  as  above.  When  the  PDS  underlying  the 
WPDS  is  obtained  from  an  ICFG  using  the  standard  encoding  (Fig.  2.4)  then  the  graphs 
coincide  with  the  control-flow  graphs  of  the  procedures  in  the  ICFG. 

The  contributions  of  the  work  presented  in  this  chapter  can  be  summarized  as  follows: 

•  We  present  a  new  reachability  algorithm  for  WPDSs  and  EWPDSs  that  improves  on 
previously  known  algorithms  for  PDS  reachability.  The  algorithm  is  asymptotically 
faster  when  the  PDS  is  regular  (decomposes  into  a  single  graph),  and  offers  substantial 
improvement  in  the  general  case  as  well.  (Section  4.1) 

•  The  algorithm  is  demand-driven,  and  computes  only  that  information  needed  for  an¬ 
swering  a  particular  user  query.  It  has  an  implicit  slicing  stage  where  it  disregards 
parts  of  the  program  not  needed  for  answering  the  user  query. 

•  We  show  that  other  analysis  questions,  namely  witness  tracing,  differential  propagation 
and  incremental  analysis,  carry  over  to  the  new  approach.  (Section  4.2) 

•  We  carried  out  experiments  on  three  very  different  applications  that  use  WPDSs  and 
obtained  substantial  speedups  for  each  of  them.  (Section  4.3) 

The  rest  of  this  chapter  is  organized  as  follows:  Section  4.1  presents  our  algorithm  for 
solving  reachability  queries  on  WPDSs  and  EWPDSs.  Section  4.2  describes  algorithms 
for  witness  tracing,  differential  propagation  and  incremental  analysis.  Section  4.3  presents 
experimental  results.  Section  4.4  describes  related  work. 


96 


4.1  Solving  WPDS  Reachability  Problems 

In  this  section,  we  show  how  to  speed  up  backward  reachability  on  WPDSs  (i.e.,  GPP; 
Defn.  2.4.4).  Solving  forward  reachability  is  similar,  but  slightly  more  complicated. 

Recall  that  solving  GPP  involves  computing  the  join-over-all-derivations  (JOD)  value 
over  the  abstract  grammar  shown  in  Fig.  2.13.  We  will  convert  the  JOD  problem  into  one  of 
computing  join-over-all- valid-paths  (JOVP)  over  a  graph  similar  to  the  ICFG,  and  then  use 
graph-based  techniques.  Our  technique  applies  to  solving  GPS  in  exactly  the  same  fashion 
by  using  the  abstract  grammar  for  GPS.  (We  use  GPP  in  this  section  because  its  abstract 
grammar  is  smaller.) 

In  this  section,  fix  W  =  ( T.S.f )  as  the  WPDS,  where  V  =  (P,  T,  A)  is  a  pushdown 
system  and  S  =  (D,  ©,  <g),  0, 1)  is  the  weight  domain.  Let  the  initial  set  of  configurations  be 
ones  that  are  accepted  by  the  P-automaton  A  =  (Q,  T,  — >o,  P,  F)- 

Definition  4.1.1.  A  (directed)  hypergraph  is  a  generalization  of  a  directed  graph  in  which 
generalized  edges,  called  hyperedges,  can  have  multiple  sources,  i.e.,  the  source  of  an  edge  is 
an  ordered  set  of  vertices.  A  transition  dependence  graph  (TDG)  for  a  grammar  G  is  a 
hypergraph  whose  vertices  are  the  non-terminals  of  G.  There  is  a  hyperedge  from  (ti,  ■  ■  ■  ,tn) 
to  t  if  G  has  a  production  with  t  on  the  left-hand  side  and  ti  ■  ■  ■  tn  are  the  non-terminals  that 
appear  (in  order)  on  the  right-hand  side. 

If  we  construct  the  TDG  of  the  grammar  shown  in  Fig.  2.13  when  the  underlying  PDS  is 
obtained  from  an  ICFG,  and  the  initial  set  of  configurations  is  {(p,e)  |  pG  P}  (or  — >0=  0), 
then  the  TDG  is  almost  identical  to  the  ICFG  (with  the  edges  reversed).  There  are  two 
differences  in  the  way  procedure  calls  are  represented:  the  TDG  has  no  analog  of  exit- 
node-to-return-node  edges,  and  one  of  the  predecessors  of  a  call-node  is  the  corresponding 
return-node.  Fig.  4.2  shows  an  example  (disregard  the  edge  labels,  nodes  tsi  and  ts2  and  the 
dotted  edges  in  Fig.  4.2(c)  for  now).  This  can  be  observed  from  the  fact  that  except  for  the 
PDS  states  in  Fig.  2.13,  the  transition  dependencies  are  almost  identical  to  the  dependencies 
encoded  in  the  pushdown  rules,  which  in  turn  come  from  ICFG  edges;  e.g.,  in  Fig.  4.2,  the 
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(a) 


(1)  (p,  n4)  {p,  n2) 

(2)  (p,  n2)  *-»  (p,  n3> 

(3)  (p,  n3)  ^  (p,  n6  n4> 

(4)  (p,  n4)  (p,  n6) 

(5)  (p,  n6>  «-*  (p,  e) 

(6)  (p,  n6)  *-*  (p,  n7> 

(7)  (p,  n7)  (p,  n8) 

(8)  (p,  n$)  *-»■  ( p ,  n9) 

(9)  (p,  ns)  *-»■  (p,  ni2> 

(10)  (p,  nd)  (p,  n10) 

(11)  (p,  n9)  *-»  (p,  nn) 

(12)  (p,  n io)  •->  (p,  n9) 

(13)  (p,  nn)  ■-»  (p,  «ia) 

(14)  (p,nu)  <-»  (p,e) 

(b) 


/■  (P.  n6,  p) 

/  W6  t 

(P.  p) 

/  (P.  n7,  p) 

w,  ( 

W7| 

(P,  n2,  p) 

/  (p.  n8,  p)  r 

w4  j 

Wat  y 

(p,  n3,  p)  < 

r  (P.  n9,  p)  -.1 

w3®t6  !  J 

(™,2  W,\ 

(P.  n4>  P)  ' 

^  (p.  n10,  p) 

w4| 

/ 

(P.  n5>  P) 

(P.  n„,  p)  7 

w5f 

W,f  / 

tsl 

(P.  n12,  P)  J 

▲ 

W14j 

*s2 

(c) 


Figure  4.2  (a)  An  ICFG.  The  e  and  exit  nodes  represent  entry  and  exit  points  of 

procedures,  respectively.  The  program  statement  are  only  written  for  illustration  purposes. 
Dashed  edges  represent  interprocedural  control  flow,  (b)  A  PDS  system  that  models  the 
control  flow  of  the  ICFG.  (c)  The  TDG  for  the  WPDS  whose  underlying  PDS  is  shown  in 
(b),  assuming  that  rule  number  i  has  weight  uy.  The  non-terminal  PopSeq (Pi7y)  is  shown  as 
simply  (p,  7,p').  Let  t3  stand  for  the  node  ( p,rij,p ).  The  thick  bold  arrows  form  a  single 
hyperedge.  Nodes  tsl  and  is2  are  ro°t  nodes,  and  the  dashed  arrow  is  a  summary  edge. 


ICFG  edge  (ni,n2)  corresponds  to  the  transition  dependence  ((t2),G),  and  the  call-return 
pair  (n3,n6)  and  (ni2,n4)  in  the  ICFG  corresponds  to  the  hyperedge  ((f4,  f6), f3). 

For  such  PDSs,  which  are  obtained  from  ICFGs,  constructing  the  TDGs  might  seem 
unnecessary  (because  the  ICFG  was  already  available)  but  it  allows  us  to  generalize  to  an 
arbitrary  initial  set  of  configurations,  which  defines  a  region  of  interest  in  the  program.  More¬ 
over,  PDSs  can  encode  a  larger  range  of  constructs  than  an  ICFG,  such  as  setjmp/longjmp 
in  C  programs.  However,  it  is  still  convenient  to  think  of  a  TDG  as  an  ICFG.  In  the  rest  of 
this  chapter,  we  illustrate  the  issues  using  the  TDG  of  the  grammar  in  Fig.  2.13.  We  reduce 
the  join-over-all-derivation  problem  on  the  grammar  to  a  join-over- all- valid-paths  problem 
on  its  TDG. 
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4.1.1  Intraprocedural  Iteration 

We  first  consider  TDGs  of  a  special  form:  consider  the  intraprocedural  case,  i.e.,  there 
are  no  hyperedges  in  the  TDG  (and  correspondingly  no  push  rules  in  the  PDS).  As  an 
example,  assume  that  the  TDG  in  Fig.  4.2  has  only  the  part  corresponding  to  procedure 
foo()  without  any  hyperedges.  In  such  a  TDG,  if  an  edge  ( (ti) ,  t)  was  inserted  because  of 
the  production  t  — >  g(ti)  for  g  =  Xx.x  ®  w  for  some  weight  w,  then  label  this  edge  with  w. 
Next,  insert  a  special  node  ts  into  the  TDG,  and  for  each  production  of  the  form  t  — >  g(e) 
with  g  =  w,  insert  the  edge  ((ts),t)  and  label  it  with  weight  w.  ts  is  called  a  root  node.  This 
gives  us  a  graph  with  weights  on  each  edge.1  Define  the  weight  of  a  path  in  this  graph  in  the 
standard  (but  reversed)  way:  the  weight  of  a  path  is  the  extend  of  weights  on  its  constituent 
edges  in  the  reverse  order.  It  is  easy  to  see  that  JOD(f)  =  0{u(?7)  |  g  G  paths(ts,t)},  where 
paths(ts,t )  is  the  set  of  all  paths  from  ts  to  t  in  the  TDG  and  v{g)  is  the  weight  of  the  path 
77.  To  solve  for  JOD,  we  could  still  use  chaotic  iteration,  but  instead  we  will  make  use  of 
Tarjan’s  path-expression  algorithm  [90]. 

Problem  1.  Given  a  directed  graph  G  and  a  fixed  vertex  s,  the  single- source  path  ex¬ 
pression  (SSPE)  problem  is  to  compute  a  regular  expression  that  represents  paths(s,v )  for 
all  vertices  v  in  the  graph.  The  syntax  of  regular  expressions  is  as  follows:  r  0  |  £  \  e  \ 
ry  U  r2  |  ry.r2  |  r* ,  where  e  stands  for  an  edge  in  G. 

We  can  use  any  algorithm  for  SSPE  to  compute  regular  expressions  for  paths(ts,  t ),  which 
gives  us  a  complete  description  of  the  set  of  paths  that  we  need  to  consider.  Moreover, 
the  Kleene-star  operator  in  the  regular  expressions  identifies  loops  in  the  TDG.  Let  <g>c  be 
the  reverse  of  ©,  i.e.,  uy  ®c  w2  =  iu2  ©  uy.  To  compute  JOD(f),  we  interpret  the  regular 
expression  for  paths(tSlt)  as  an  expression  over  the  weight  domain:  replace  each  edge  e  with 
its  weight,  0  with  0,  £  with  1,  U  with  ©,  .  with  g)c;  and  then  evaluate  the  expression.  The 
weight  w*  is  computed  as  1  ©  w  ©  (w  <g)  w)  ©  •  •  • ;  because  of  the  no-infinite- ascending-chain 
property  of  the  semiring,  this  iteration  converges.  The  two  main  advantages  of  using  regular 
1A  hypergraph  reduces  to  a  graph  when  all  hyperedges  have  a  single  source  node. 
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expressions  to  compute  JOD(t)  are:  First,  loops  are  identified  in  the  expression,  and  the 
evaluation  of  an  expression  is  forced  to  saturate  a  loop  before  exiting  it.  Second,  we  can 
compute  w*  faster  than  using  the  normal  iteration  sequence.  For  this,  observe  that 

(T  ©  w)n  =  T  ©  w  ©  w2  ©  •  •  •  © 

where  exponentiation  is  defined  using  ®,  i.e.,  w°  =  1  and  wl  =  w  <g)  w Then  w*  can  be 
computed  by  repeatedly  squaring  (1  ©  w)  until  it  converges.  If  w*  =  1  ©  w  ©  •  •  •  ©  wn  then  it 
can  be  computed  in  O(log  n)  operations.  A  chaotic-iteration  strategy  would  take  0(n)  steps 
to  compute  the  same  value.  In  other  words,  having  a  closed  representation  of  loops  provides 
an  exponential  speedup.2 

Tarjan’s  path-expression  algorithm  solves  the  SSPE  problem  efficiently.  It  uses  domi- 
nators  to  construct  the  regular  expressions  for  SSPE.  This  has  the  effect  of  computing  the 
weight  on  the  dominators  of  a  node  before  computing  the  weight  on  the  node  itself.  This 
avoids  unnecessary  propagation  of  weights  to  the  node  (which  is  the  case,  for  instance,  when 
one  exits  a  loop  too  early).  Given  a  graph  with  m  edges  (or  m  grammar  productions  in  our 
case)  and  n  nodes  (or  non-terminals),  regular  expressions  for  paths(ts,t )  can  be  computed  for 
all  nodes  t  in  time  O(mlog  n)  when  the  graph  is  reducible.  Evaluating  these  expressions  will 
take  an  additional  0(m  log  n  log  h)  semiring  operations,  where  h  is  the  height  of  the  semir¬ 
ing.3  These  expressions  are  represented  using  shared  DAGs,  i.e.,  expressions  for  paths(ts,ti) 
and  paths{tS)t-2)  can  share  common  sub-expressions,  even  when  t\  ^  t2-  The  combined  size 
of  all  the  regular  expressions  is  bounded  by  the  time  taken  to  find  the  expressions;  i.e.,  the 
combined  size  is  O  (m  log  n) . 

Because  most  high-level  languages  are  well-structured,  their  ICFGs  are  mostly  reducible. 
When  the  graph  is  not  reducible,  the  running  time  degrades  to  0((m  log  n+k)  log  h )  semiring 
operations,  where  k  is  the  sum  of  the  cubes  of  the  sizes  of  dominator- strong  components  of  the 

2This  assumes  that  each  semiring  operation  takes  the  same  amount  of  time.  In  the  absence  of  any 
assumption  on  the  semiring  being  used,  we  aim  to  decrease  the  number  of  semiring  operations.  In  some  cases, 
e.g.,  BDD-based  weight  domains,  repeated  squaring  may  not  reduce  the  overall  running  time.  However,  the 
user  can  supply  a  procedure  for  computing  w*  whenever  there  is  a  more  efficient  way  of  computing  it  than 
by  using  simple  iteration  sequence  [63]. 

3As  usual,  we  assume  the  height  to  be  bounded  while  discussing  complexity  results. 
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graph.  In  the  worst  case,  k  can  be  0(n3).  In  our  experiments,  we  seldom  found  irreducibility 
to  be  a  problem:  k/n  was  a  small  constant.  A  pure  chaotic-iteration  strategy  would  take 
0(m  h )  semiring  operations  in  the  worst  case.  Comparing  these  complexities,  we  can  expect 
the  algorithm  that  uses  path  expressions  to  be  much  faster  than  chaotic  iteration,  and  the 
benefit  will  be  greater  as  the  height  of  the  semiring  increases. 

4.1.2  Interprocedural  Iteration 

We  now  generalize  our  algorithm  to  any  TDG.  For  each  hyperedge  ((ti,t2),t),  delete  it 
from  the  graph  and  replace  it  with  the  edge  ((ti),t).  This  new  edge  is  called  a  summary  edge, 
and  node  t2  is  called  an  out-node.  Out-nodes  will  be  used  to  represent  the  summary  weight 
of  a  procedure.  For  example,  in  Fig.  4.2,  we  would  delete  the  hyperedge  ((f4,f6),f3)  and 
replace  it  with  ((£4),  f3).  The  new  edge  is  called  a  summary  edge  because  it  crosses  a  call-site 
(from  a  return  node  to  a  call  node)  and  will  be  used  to  summarize  the  effect  of  a  procedure 
call.  Node  t6  is  an  out-node  and  will  supply  the  summary  weight  of  procedure  foo.  The 
resultant  TDG  is  a  collection  of  connected  graphs,  with  each  graph  roughly  corresponding 
to  a  procedure.  In  Fig.  4.2,  the  transitions  that  correspond  to  procedures  main  and  foo  get 
split.  Each  connected  graph  is  called  an  intragraph.  For  each  intragraph,  we  introduce  a 
root  node  as  before,  and  add  edges  from  the  root  node  to  all  nodes  that  have  e-productions. 
The  weight  labels  are  also  added  as  before.  For  a  summary  edge  ( (ti) ,  t)  obtained  from  a 
hyperedge  ((ti,t2),t)  with  associated  production  function  g  =  \x.\y.w®x®y,  label  it  with 
w  ®  t2,  or  t2  ®c  w. 

This  gives  us  a  collection  of  intragraphs  with  edges  labeled  with  either  a  weight  or  a 
simple  expression  over  an  out-node.  To  solve  for  the  JOD  value,  we  construct  a  set  of 
regular  equations ,  which  we  call  out-node  equations.  For  an  intragraph  G,  let  to  be  its 
unique  root  node.  Then,  for  each  out-node  t0  in  G ,  construct  the  regular  expression  for 
all  paths  in  G  from  to  to  tQ,  i.e.,  for  paths(tG,t0).  In  this  expression,  replace  each  edge 
with  its  corresponding  label.  If  the  resulting  expression  is  r  and  it  contains  labels  t\  to  tn, 
then  add  the  equation  tQ  =  r(fi,  •  •  •  ,  tn)  to  the  set  of  out-node  equations.  Repeat  this  for 
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all  intragraphs.  For  example,  for  the  TDG  shown  in  Fig.  4.2,  assuming  that  t\  is  also  an 
out-node,  we  would  obtain  the  following  out-node  equations.4 

h  =  Wi4.(w9  ©  Wi3.Wu.(wi2.Wio)*.Ws).W7.W6 

t\  =  w5.wi.(t6.w3).w2.w1 
Here  we  have  used  .  as  a  shorthand  for  (g)c. 

The  resulting  set  of  out-node  equations  describe  all  hyperpaths  in  the  TDG  to  an  out- 
node  from  the  collection  of  all  root  nodes.  Hence,  the  JOD  value  of  the  out-nodes  is  the 
least  fix-point  of  these  equations  (with  respect  to  jZ  of  Defn.  2.4. 1(4)). 

One  way  to  solve  these  equations  is  by  using  chaotic  iteration:  start  by  initializing  each 
out-node  with  0  (the  least  element  in  the  semiring)  and  update  the  values  of  out-nodes  by 
repeatedly  solving  the  equations  until  they  converge.  However,  learning  from  our  previous 
observations,  we  give  a  direction  to  the  iteration  strategy.  This  can  be  done  using  regular 
expressions  on  the  dependence  graph  of  the  equations  as  follows.  For  each  equation  ta  = 
r(ti,  ■  ■  ■  ,tn),  produce  the  edges  U  — >  t0, 1  <  i  <  n  and  construct  a  graph  from  these  edges. 
Label  each  edge  with  the  expression  (r)  that  it  came  from.  Assume  any  out-node  to  be  the 
source  node  and  construct  a  regular  expression  to  all  other  nodes  using  SSPE  again.  These 
expressions  give  the  order  in  which  equations  have  to  be  evaluated.  For  example,  consider 
the  following  set  of  equations  on  three  out-nodes: 

ti  =  ri(ti,t3)  t2  =  r2(ti)  t3  =  r3(t2) 

Then  a  possible  regular  expression  for  paths  from  t\  to  itself  would  be  (rq  U  r2.r3.r\)* .  This 
suggests  that  to  solve  for  t\  we  should  use  the  following  evaluation  strategy:  evaluate  rq, 
update  ti,  then  evaluate  r2,  r3,  and  r i,  and  update  t\  again  —  repeating  this  until  the 
solution  converges. 

In  our  implementation,  we  use  a  simpler  strategy  that  still  turns  out  to  be  efficient  in 
practice.  We  take  a  strongly  connected  component  (SCC)  decomposition  of  the  dependence 
graph  and  solve  all  equations  in  one  component,  using  chaotic-iteration,  before  moving  on 
to  the  equations  in  the  next  component  (in  a  topological  order).  This  is  efficient  because 

4The  equations  might  be  different  depending  on  how  the  SSPE  problem  was  solved,  but  all  such  equations 
would  have  the  same  solution. 
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SCCs  in  the  dependence  graph  correspond  to  a  set  of  mutually  recursive  procedures  and 
these  groups  tend  to  be  quite  small  in  practice. 

Each  regular  expression  in  the  out-node  equations  summarizes  all  paths  in  an  intragraph, 
and  can  be  quite  large.  Therefore,  we  want  to  avoid  evaluating  them  repeatedly  while  solving 
the  equations.  To  this  end,  we  incrementally  evaluate  the  regular  expressions:  only  that  part 
of  an  expression  is  reevaluated  that  contains  a  modified  out-node.  The  algorithm  is  given  in 
Fig.  4.4.  Whenever  this  algorithm  is  used  on  a  regular  expression,  the  whole  expression  may 
be  traversed,  but  it  only  performs  weight  operations  on  nodes  such  that  the  sub-expression 
rooted  at  that  node  contains  a  modified  out-node. 

A  regular  expression  is  represented  using  its  abstract-syntax  tree  (AST),  where  leaves 
are  weights  or  out-nodes,  and  internal  nodes  correspond  to  ©,  or  *.  A  possible  AST  for 
the  regular  expression  for  out-node  t\  of  Fig.  4.2  is  shown  in  Fig.  4.3.  Whenever  the  value 
of  out-node  A  is  updated,  one  only  needs  to  reevaluate  the  weight  of  subtrees  at  «4,  03,  and 
ai,  and  update  the  value  of  out-node  t\  to  the  weight  at  Gq. 


Figure  4.3  An  AST  for  w§.w±.(wz  <g)  t§).w 2-W\.  Internal  nodes  for  ©c  are  converted  into  ® 
nodes  by  reversing  the  order  of  its  children.  Internal  nodes  in  this  AST  have  been  given 

names  ai  to  a5. 

One  complication  that  we  face  here  is  that  the  ASTs  actually  have  a  shared-DAG  repre¬ 
sentation  to  allow  different  expressions  to  share  common  sub-expressions.  (This  is  a  require¬ 
ment  of  Tarjan’s  path-expression  algorithm.)  Thus,  our  algorithm  needs  to  take  care  of  two 
aspects:  it  should  benefit  from  the  sharing  in  the  DAGs  as  much  as  possible,  and  it  must 
also  be  able  to  identify  the  part  of  an  expression  that  is  modified  when  the  weight  of  an 
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out-node  is  updated.  For  this,  we  maintain,  at  each  DAG  node,  two  integers,  last.change 
and  last_seen,  as  well  as  the  weight  weight  of  the  subdag  rooted  at  the  node.  We  assume 
that  all  regular  expressions  share  the  same  leaves  for  out-nodes.  We  keep  a  global  counter 
update.count  that  is  incremented  each  time  the  weight  of  some  out-node  is  updated.  Our 
incremental  evaluation  algorithm  is  shown  in  Fig.  4.4.  After  calling  evaluate( r),  the  weight 
r. weight  is  the  correct  updated  weight  of  expression  r,  no  matter  how  many  times  the 
weights  of  out  nodes  were  updated  since  the  last  call  to  evaluate  on  r. 

For  a  node,  the  counter  last.change  records  the  last  value  of  update.count  for  which  the 
weight  of  its  subdag  changed,  and  the  counter  last_seen  records  the  value  of  update_count 
when  the  subdag  was  reevaluated.  When  the  weight  of  an  out-node  is  changed,  its  corre¬ 
sponding  leaf  node  is  updated  with  that  weight,  update_count  is  incremented,  and  both  of 
the  out-node’s  counters  (last.change  and  last.seen)  are  set  to  update_count. 

This  incremental-evaluation  algorithm  is  used  as  follows:  we  solve  the  out-node  equations 
in  the  same  order  as  described  earlier,  but  as  the  algorithm  iterates  over  the  equations,  when¬ 
ever  it  picks  an  equation  t  =  r,  it  calls  evaluate (r)  to  compute  the  weight  of  r  incrementally; 
next,  it  updates  the  value  of  t  to  this  weight  and  increments  update_count.  These  steps  are 
repeated  for  each  out-node  equation. 

Once  we  solve  for  the  values  of  the  out-nodes,  we  can  change  the  out-node  labels  on 
summary  edges  in  the  intragraphs  and  replace  them  with  their  corresponding  weight.  Then 
the  JOD  values  for  other  nodes  in  the  TDG  can  be  obtained  as  in  the  intraprocedural  version 
by  considering  each  intragraph  in  isolation. 

The  time  required  for  solving  this  system  of  equations  depends  on  the  reducibility  of 
the  intragraphs.  Let  Sq  be  the  time  required  to  solve  SSPE  on  intragraph  G,  i.e. ,  So  = 
0(m  log  n+k)  where  k  is  0(n3)  in  the  worst-case,  but  is  ignorable  in  practice.  If  the  equations 
do  not  have  any  cyclic  dependencies  (corresponding  to  no  mutually  recursive  procedures) 
then  the  running  time  is  Y2g  log  h,  where  the  sum  ranges  over  all  intragraphs,  because  each 
equation  has  to  be  solved  exactly  once.  In  the  presence  of  recursion,  we  use  the  observation 
that  the  weight  of  each  subdag  in  a  regular  expression  can  change  at  most  h  times  while 
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1  procedure  evaluate( r) 

2  begin 

3  if  r .  last_seen  ==  update_count  then 

4  return; 

5  case  r  =  w,  r  =  t0  return; 

6  case  r  = 

7  evaluate(ri) 

8  if  ri .  last_change  >  r .  last_seen  then 

9  w  =  (ri. weight)* 

10  if  r. weight  ^  w  then 

11  r . last_change  =  ri . last_change 

12  r. weight  =  w 

13  r.last_seen  =  update.count 

14  case  r  =  ©  r2 

15  evaluate(ri) 

16  evaluate(r2) 

17  m  =  maxjrx . last.change ,  r2 . last .change} 

18  if  m  >  r .  last_seen  then 

19  w  =  ri. weight  0  r2. weight 

20  if  r. weight  ^  w  then 

21  r .  last.change  =  in 

22  r. weight  =  w 

23  r.last_seen  =  update.count 

24  end 

Figure  4.4  Incremental  evaluation  algorithm  for  regular  expressions.  Here  0  stands  for 

either  0  or  0. 

the  equations  are  being  solved.  Because  the  size  of  a  regular  expression  obtained  from  an 
intragraph  G  is  bounded  by  So,  the  worst-case  time  for  solving  the  equations  is  So  h. 
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This  bound  is  very  pessimistic  and  is  actually  worse  than  that  of  chaotic  iteration.  Here 
we  did  not  make  use  of  the  fact  that  incrementally  reevaluating  regular  expressions  is  much 
faster  than  reevaluating  them  from  scratch.  For  a  regular  expression  with  one  modified  out- 
node,  we  only  need  to  perform  semiring  operations  for  each  node  from  the  out-node  leaf  to 
the  root  of  the  expression.  For  a  nearly  balanced  regular  expression  tree,  this  path  to  the 
root  can  be  as  small  as  log  Sq-  Empirically,  we  found  that  incrementally  reevaluating  the 
expression  required  many  fewer  operations  than  reevaluating  the  expression  from  scratch. 

Unlike  with  chaotic  iteration,  where  the  weights  of  all  TDG  nodes  are  computed,  we 
only  need  to  compute  the  weights  on  out-nodes.  The  weights  for  the  rest  of  the  nodes 
can  be  computed  lazily  by  evaluating  their  corresponding  regular  expression  when  needed. 
For  applications  that  just  require  the  weight  for  a  few  TDG  nodes,  this  gives  us  additional 
savings.  We  also  limit  the  computation  of  weights  of  out-nodes  to  only  those  intragraphs 
that  contain  a  TDG  node  whose  weight  is  required.  This  corresponds  to  slicing  the  out-node 
equations  with  respect  to  the  user  query,  which  rules  out  computation  in  procedures  that 
are  irrelevant  to  the  query.  Moreover,  the  algorithm  can  be  executed  on  multi-processor 
machines  by  assigning  each  intragraph  to  a  different  processor.  The  only  communication 
required  between  the  processors  would  be  the  weights  on  out-nodes  while  they  are  being 
saturated. 

4.1.3  Solving  EWPDS  Reachability  Problems 

Reachability  problems  for  EWPDSs  are  also  based  on  abstract  grammars,  similar  to  the 
ones  for  WPDSs.  Thus,  we  can  easily  adapt  our  algorithm  to  EWPDSs.  The  abstract 
grammar  for  GPP  and  GPS  on  EWPDSs  are  shown  in  Figs.  3.9  and  3.13,  respectively. 

These  grammars  only  differ  from  those  for  WPDSs  in  the  application  of  the  merge  func¬ 
tion.  This  difference  can  be  handled  as  follows:  to  solve  GPP,  for  hyperedges  in  the  TDG 
corresponding  to  case  4  of  Fig.  3.9,  if  tQ  is  the  out-node,  then  label  the  corresponding  sum¬ 
mary  edge  with  mr(l,t0)  (in  keeping  with  the  production  function  g 4).  We  use  EWPDSs  in 
our  experiments. 
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4.2  Solving  other  WPDS  Problems 

In  this  section,  we  give  algorithms  for  some  important  PDS  problems:  witness  tracing, 
differential  propagation,  and  incremental  analysis.  Of  these  three,  only  witness  tracing  and 
differential  propagation  have  been  discussed  before  for  WPDSs  [83]. 

4.2.1  Witness  Tracing 

For  program-analysis  tools,  if  a  program  does  not  satisfy  a  property,  it  is  often  useful 

to  provide  a  justification  of  why  the  property  was  not  satisfied.  In  terms  of  WPDSs,  it 

amounts  to  reporting  a  set  of  paths,  or  rule  sequences,  that  together  justify  the  reported 

weight  for  a  configuration.  Formally,  using  the  notation  of  Defn.  2.4.4,  the  witness  tracing 

problem  for  GPP(C)  is  to  find,  for  each  configuration  c,  a  set  oj(c)  C  (J  paths(c,c')  such 

c'eC 

that  0  u(cr)  =  5(c,  C).  This  definition  of  witness  tracing  does  not  impose  any  restrictions 

<tGc j(c) 

on  the  size  of  the  reported  witness  set  because  any  compact  representation  of  the  set  suffices 
for  most  applications.  The  algorithm  for  witness  tracing  for  GPP  [83]  requires  0(\Q\2  |T|  h ) 
memory.  Our  algorithm  only  requires  0(\0N\  D  h )  memory,  where  \ON\  is  the  number  of 
out-nodes  and  D  is  the  maximum  number  of  out-nodes  that  appear  on  the  right-hand  side  of 
an  out-node  equation.  Typically,  \ON\  is  the  number  of  procedures,  which  is  much  smaller 
than  | r | ,  and  D  is  the  maximum  number  of  call  sites  in  any  procedure,  which  is  usually 
a  small  constant.  One  can  consider  (\ON\D)  to  be  roughly  the  size  of  the  call  graph  of  a 
program.  Essentially,  the  idea  behind  our  algorithm  is  to  perform  a  two-level  staging ,  where 
only  a  subset  of  the  witness  information  needs  to  be  kept  in  memory,  and  the  rest  can  be 
computed  on  demand. 

In  our  new  GPP  algorithm,  we  already  compute  regular  expressions  that  describe  all 
paths  in  an  intragraph.  In  the  intragraphs,  we  label  each  edge  with  not  just  a  weight,  but 
also  the  rule  that  justifies  the  edge.  Push  rules  will  be  associated  with  summary  edges  and 
pop  rules  with  edges  that  originate  from  a  root  node.  Edges  from  the  root  node  that  were 
inserted  because  of  production  (1)  in  Fig.  2.13  are  not  associated  with  any  rule  (or  with  an 
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empty  rule  sequence).  After  solving  SSPE  on  the  intragraphs,  we  can  replace  each  edge  with 
the  corresponding  rnlc  label.  This  gives  us,  for  each  out-node,  a  regular  expression  in  terms 
of  other  out-nodes  that  captures  the  set  of  all  rule  sequences  that  can  reach  that  out-node. 

While  solving  the  out- node  equations,  we  record  the  weights  on  out-nodes;  i.e.,  when  we 
solve  the  equation  t0  =  r(fi,  •  •  •  ,  tn),  we  record  the  weights  on  ti,  ■  ■  ■  ,  tn  —  say  wi,  ■  ■  •  ,  wn  — 
whenever  the  weight  on  t0  changes  to,  say,  wa,  by  saving  the  tuple  (ta,  w0,  ti,  w±,  ■  ■  ■  ,tn,  wn) 
to  memory.  Then  the  set  of  rule  sequences  to  create  transition  t0  with  weight  wQ  is  given 
by  the  expression  r  (where  we  replace  TDG  edges  with  their  rule  labels)  by  replacing  each 
out-node  t%  with  the  regular  expression  for  all  rule  sequences  used  to  create  U  with  weight  vjt 
(obtained  recursively).  This  gives  a  regular  expression  for  the  witness  set  of  each  out-node. 
Witness  sets  for  other  transitions  can  be  obtained  by  solving  SSPE  on  the  intragraphs  by 
replacing  out-node  labels  with  their  witness-set  expression. 

We  only  require  0(\0N\  D  h )  space  for  recording  witnesses  because  we  just  have  to 
remember  the  history  of  weights  on  out-nodes,  and  each  piece  of  information  is  at  most  a 
(2 D  +  2)-ary  tuple. 

4.2.2  Differential  Propagation 

The  general  framework  of  WPDSs  can  sometimes  be  inefficient  for  certain  analysis.  While 
executing  GPP,  when  the  weight  of  a  transition  changes  from  w\  to  W2  =  W\  ©  w ,  the  new 
weight  W2  is  propagated  to  other  transitions.  However,  because  the  weight  w\  had  already 
been  propagated,  this  will  do  extra  work  by  propagating  W\  again  when  only  w  (or  a  part  of 
w)  needs  to  be  propagated.  This  simple  observation  can  be  incorporated  into  WPDSs  when 
the  semiring  weight  domain  has  a  special  subtraction  operation  (called  diff,  denoted  by  — ) 
[83].  The  diff  operator  must  satisfy  the  following  properties:  For  each  a,  b,  c  G  D, 
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a  ©  (b  —  a) 

=  a  ©  b 

(4.1) 

a  —  b)  —  c 

=  a  —  (b  (Be) 

(4.2) 

a  ©  6  =  a 

b  —  a  =  0 

(4.3) 

For  example,  for  the  relational  weight  domain  (Defn.  2.4.13),  set  difference  (when  rela¬ 
tions  are  considered  to  be  sets  of  tuples)  satisfies  all  of  the  above  properties. 

We  make  use  of  the  diff  operation  while  solving  the  set  of  regular  equations.  In  addition  to 
incrementally  computing  the  regular  expressions,  we  also  incrementally  compute  the  weights. 
When  the  weight  of  an  out-node  changes  from  up  to  w 2,  we  associate  its  corresponding  leaf 
node  with  the  change  w2  —  W\ .  This  change  is  then  propagated  to  other  nodes.  If  the 
weight  of  expressions  rq  and  r2  are  uq  and  w2,  respectively, and  they  change  by  d\  and  d2, 
then  the  weights  of  the  following  kinds  of  expressions  change  as  follows: 

r  1  U  r2  :  di  ©  d2 

rq.r2  :  (di  <g>c  d2)  ©  {di  ®c  w2)  ©  (uq  ©c  d2) 

r*  :  (uq  ©  di)*  —  w\ 

There  is  no  better  way  of  computing  the  change  for  Kleene-star  (chaotic  iteration  suffers 
from  the  same  problem),  but  we  can  use  the  diff  operator  to  compute  the  Kleene-star  of  a 
weight  as  shown  in  Fig.  4.5. 

Theorem  4.2.1.  The  procedure  Kleene-star,  defined  in  Fig.  f.5,  when  applied  to  weight  w, 
returns  w* . 

Proof.  The  proof  is  by  induction.  But,  first,  we  need  some  auxiliary  properties  of  diff. 


a  —  b  =  0  and  b  —  a  =  0 


(a  =  b ) 


(a  ©  b)  —  a 


(■ b  —  a) 


Eqn.  (4. f):  This  follows  from  Eqn.  (4.3):  a  —  b  =  0  implies  ( b  ©  a) 
implies  that  (a  ©  b)  =  a.  Because  ©  is  commutative,  a  =  b. 


(4.4) 

(4.5) 

b  and  b  —  a  =  0 
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1  procedure  Kleene-star{ w) 

2  begin 

3  wstar  =  del  =  1 

4  while  del  ^  0 

5  temp  =  del  (8)  w 

6  del  =  temp  —  wstar 

7  wstar  =  wstar  ©  temp 

8  return  wstar 

9  end 

Figure  4.5  Procedure  for  computing  the  Kleene-star  of  a  weight  using  the  diff  operation 

on  weights. 


Eqn.  (4:. 5):  To  prove  this  equality,  we  will  show  that  ( Iks  —  rhs)  =  0  and  ( rhs  —  Ihs)  =  0. 
Then  Eqn.  (4.4)  shows  that  Ihs  =  rhs. 


((a  ©  b)  —  a)  —  (b  —  a)  = 


(b  —  a)  —  ((a  ©  b)  —  a)  = 


(a 

© 

b) 

—  (a 

© 

(b~ 

-  a)) 

by 

Eqn. 

(4.2) 

(a 

© 

b ) 

—  (a 

© 

b ) 

by 

Eqn. 

(4.1) 

o 

by 

Eqn. 

(4.3) 

b  - 

— 

(a 

©  ((o 

© 

6)- 

-  a)) 

by 

Eqn. 

(4.2) 

b  - 

— 

(a 

©6) 

by 

Eqn. 

(4.1) 

o 

by 

Eqn. 

(4.3) 

)  w 

n 

where  S0( 

w)  = 

=  1,  and 

[w) 

=  d. 

Then 

Let  Sn(w)  =  1  ©  w  ©  w 2  ©  •  • 
that  Sn(w )  =  1  +  (w  ©  Sn-i(w))  for  all  n  >  0.  The  invariant  in  Fig.  4.5  is  that  whenever 
execution  reaches  line  4  for  the  nth  time,  wstar  =  Sn(w )  and  del  =  ( Sn(w )  —  Sn-i(w)).  We 
will  prove  this  invariant  by  induction,  but  it  is  easy  to  see  that  if  this  invariant  holds,  and 
the  while  loop  terminates,  then  wstar  =  w*. 

The  base  case  is  n  —  1,  and  is  easy  to  establish.  The  inductive  case  is  proved  as  follows. 
The  variable  wstar  is  updated  in  the  loop  body  to  wstar  ©  (del  ©  w).  This  equals: 
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Sn(w)  ©  (Sn(w)  —  Sn-i(w))  ©  w 
=  (I©  w  ©  Sn-i(w))  ©  (Sn(w)  —  Sn_i(w))  ©  w 
=  I  ©  (w  ©  (Sn- i(w)  ©  (Sn(w)  —  Sn-i(w)))) 

=  I©  (w  ©  (Sn-i(w)  ©  Sn(w))) 

=  1  ®  (w  ®  Sn(w)) 


>(*) 


=  Sn+i(w)  j 

The  variable  del  is  updated  in  the  loop  body  to  (del  ©  w)  —  wstar.  This  equals: 


{(Sn(w)  —  Sn_ i(w))  ©w)  —  Sn(w) 

=  ( Sn(w )  ©  ((Sn(w)  —  Sn-i(w))  ©  w))  —  Sn(w)  by  Eqn.  (4.5) 

=  Sn+i(w)  —  Sn(w)  by  (*) 

□ 


4.2.3  Incremental  Analysis 

An  incremental  algorithm  for  verifying  finite-state  properties  on  ICFGs  was  given  by 
Conway  et  al.  [22] .  We  can  use  the  methods  presented  in  this  chapter  to  generalize  their  al¬ 
gorithm  to  WPDSs.  An  incremental  approach  to  verification  has  the  advantage  of  amortizing 
the  verification  time  across  program  development  or  debugging  time. 

We  consider  two  cases:  addition  of  new  rules  and  deletion  of  existing  ones.  In  each 
case,  we  work  at  the  granularity  of  intragraphs.  Let  W  be  the  original  WPDS  for  which  we 
have  already  computed  the  out-node  equations  E  and  solved  them.  Let  W  be  the  WPDS 
obtained  from  W  after  making  some  changes  to  it. 

First,  consider  the  addition  of  new  rules.  In  this  case,  the  fix-point  solution  of  the  out- 
node  equations  monotonically  increases  and  we  can  reuse  all  of  the  existing  computation. 
We  identify  the  intragraphs  that  changed  (i.e.,  they  have  more  edges)  because  of  the  new 
rules.  Next,  we  recompute  the  regular  expressions  for  out-nodes  in  those  intragraphs  and 
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add  them  to  the  set  of  out-node  equations  E.5  Then  we  solve  the  equations  as  described  in 
Section  4.1.2,  but  after  setting  different  initial  weights  for  the  out-nodes.  In  the  algorithm 
described  in  Section  4.1.2,  the  initial  weights  of  all  out-nodes  were  0.  For  the  incremental 
algorithm,  set  the  initial  weights  of  out-nodes  that  appear  in  E  to  be  the  weight  obtained 
after  solving  E,  and  set  the  initial  weights  of  new  out-nodes  (i.e.,  ones  that  did  not  appear 
in  E)  to  0.  This  leverages  the  existing  information  to  reach  a  fix-point  sooner. 

Deletion  of  a  rule  requires  more  work.  Again,  we  identify  the  changed  intragraphs  and 
recompute  the  out-node  equations  for  them.  We  call  out-nodes  in  these  intragraphs  modified 
out-nodes.  Next,  we  construct  the  dependence  graph  of  the  out-node  equations  as  described 
in  Section  4.1.2.  We  perform  an  SCC  decomposition  of  this  graph  and  topologically  sort 
the  SCCs.  Then  the  weights  for  all  out-nodes  that  appear  before  the  first  SCC  that  has 
a  modified  out-node  need  not  be  changed.  Thus,  we  set  the  value  of  these  out-nodes  to 
be  the  weights  obtained  after  solving  E.  We  recompute  the  solution  for  other  out-nodes  in 
topological  order,  and  stop  as  soon  as  the  new  weights  agree  with  previous  weights.  This 
is  done  as  follows.  We  start  with  out-nodes  in  the  first  SCC  that  has  a  modified  out-node; 
initialize  the  weights  of  all  out-nodes  in  this  SCC  to  be  0,  and  solve  the  out-nodes  equations 
for  the  SCC.  If  the  new  weight  of  an  out-node  is  different  from  its  previously  computed 
weight,  all  out-nodes  in  later  SCCs  that  are  dependent  on  it  are  marked  as  modified.  We 
repeat  this  procedure  until  there  are  no  more  modified  out-nodes. 

The  advantage  of  doing  incremental  analysis  in  our  framework  is  that  very  little  infor¬ 
mation  has  to  be  stored  between  analysis  runs:  we  only  need  to  store  the  computed  weights 
for  out- nodes. 

4.3  Experiments 

We  compare  our  algorithm  from  Section  4.1  against  the  ones  from  [83]  and  Section  3.2  (for 
WPDSs  and  EWPDSs,  respectively),  which  are  implemented  in  WPDS++  [49].  We  refer  to 

5There  are  incremental  algorithms  for  SSPE  as  well,  but  we  have  not  used  them  because  solving  SSPE 
for  a  single  intragraph  is  usually  very  fast. 
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the  implementation  of  our  algorithm  as  FWPDS  (F  stands  for  “fast”).  WPDS++  supports 
an  optimized  iteration  strategy  (over  chaotic  iteration)  where  the  user  can  supply  a  priority¬ 
ordering  on  stack  symbols,  which  is  used  by  chaotic  iteration  to  choose  the  transition  with 
least  priority  first.  We  refer  to  this  version  as  BFS-WPDS++  and  supply  it  with  a  breadth- 
first  ordering  on  the  1CFG  obtained  by  treating  it  as  an  ordinary  graph.  BFS-WPDS++ 
alrnost  always  performs  better  than  WPDS++  with  chaotic  iteration. 

To  measure  end-to-end  performance,  FWPDS  only  computes  the  weight  on  transitions  of 
the  output  automaton  (corresponding  to  TDG  nodes)  that  are  required  by  the  application. 
We  also  report  the  time  taken  to  compute  the  weight  on  all  transitions  and  refer  to  this 
as  FWPDS-Full.  A  comparison  with  FWPDS-Full  will  give  an  indication  of  “application- 
independent”  improvement  provided  by  our  approach  because  it  computes  the  same  amount 
of  information  as  the  previous  WPDS  algorithms.  However,  we  measure  speedups  using 
FWPDS  running  times  to  show  the  potential  of  using  lazy-evaluation  in  a  real  setting. 
FWPDS-Full  uses  a  left-associative  evaluation  order  for  computing  weights  of  regular  ex¬ 
pressions.  It  is  also  worth  noting  that  repeated  squaring  for  computing  w*  did  not  cause  any 
appreciable  difference  compared  with  using  a  simple  iterative  method. 

We  tested  FWPDS  on  three  applications  that  use  (E)WPDSs.  In  each,  we  perform 
GPS  on  the  (E)WPDS  with  the  entry  point  of  the  program  as  the  initial  configuration. 
The  first  application  performs  affine-relation  analysis  (ARA)  on  x86  programs  [60].  An  x86 
program  is  translated  into  a  WPDS  to  find  affine  relationships  between  machine  registers. 
The  application  only  requires  affine  relationships  at  branch  points  [3].  The  results  are  shown 
in  Tab.  4.1.  Over  all  the  experiments  we  performed,  FWPDS  provided  an  average  speedup 
of  1.8 x  (i.e.,  reduced  running  time  by  44%)  over  BFS-WPDS++. 

The  second  application,  BTRACE,  is  for  debugging  [56].  It  performs  path  optimization  on 
C  programs:  given  a  set  of  ICFG  nodes,  called  critical  nodes,  it  tries  to  find  a  shortest  ICFG 
path  that  touches  the  maximum  number  of  these  nodes.  The  path  starts  at  the  entry  point 
of  the  program  and  stops  at  a  given  failure  point  in  the  program.  FWPDS  only  computes 
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Time  ( 

s) 

Speedup 

Prog 

Insts 

#  Procs 

WPDS++ 

BFS-WPDS++ 

FWPDS-Full 

FWPDS 

mplayer2 

40052 

385 

2.11 

1.30 

1.17 

0.69 

1.88 

print 

75539 

697 

1.23 

1.02 

0.77 

0.41 

2.49 

find 

76240 

703 

11.03 

8.17 

6.99 

4.58 

1.78 

attrib 

76380 

703 

2.52 

2.11 

1.57 

0.89 

2.37 

doskey 

77983 

716 

2.27 

1.83 

1.15 

0.75 

2.44 

xcopy 

87000 

780 

22.28 

15.78 

13.68 

8.80 

1.79 

sort 

89291 

840 

13.47 

11.16 

10.00 

6.34 

1.76 

more 

90792 

860 

17.42 

11.92 

10.54 

7.02 

1.70 

tracert 

95459 

870 

9.83 

8.16 

7.03 

4.45 

1.83 

finger 

96123 

893 

11.14 

7.94 

7.13 

4.44 

1.79 

rsh 

100935 

941 

18.31 

13.17 

11.65 

7.47 

1.76 

javac 

101369 

944 

20.12 

16.20 

14.65 

9.25 

1.75 

lpr 

110301 

1011 

14.83 

11.75 

10.57 

7.06 

1.66 

java 

112305 

1049 

24.77 

20.19 

19.01 

11.97 

1.69 

ftp 

130255 

1253 

22.84 

15.13 

14.23 

8.98 

1.68 

winhlp32 

157634 

1612 

25.51 

19.61 

17.32 

11.00 

1.78 

regsvr32 

225857 

2789 

58.70 

38.83 

37.15 

24.65 

1.58 

cmd 

230481 

2317 

69.19 

46.33 

52.38 

34.87 

1.33 

notepad 

239408 

2911 

54.08 

40.80 

41.85 

26.50 

1.54 

Table  4.1  Comparison  of  ARA  results.  The  last  column  show  the  speedup  (ratio  of 
running  times)  of  FWPDS  versus  BFS-WPDS++.  The  programs  are  common  Windows 
executables,  and  the  experiments  were  run  on  3.2  Ghz  P4  machine  with  4GB  RAM. 


the  weight  at  the  failure  point.  As  shown  in  Tab.  4.2,  FWPDS  performs  much  better  than 
BFS-WPDS++  for  this  application,  and  the  overall  speedup  was  3.6 x. 

The  third  application  is  MOPED  [50],  which  is  a  model  checker  for  Boolean  programs. 
It  uses  its  own  WPDS  library  for  performing  reachability  queries  (which  is,  again,  based 
on  the  chaotic-iteration  strategy).  Weights  are  binary  relations  on  valuations  of  Boolean 
variables,  and  are  represented  using  BDDs.  We  measure  the  performance  of  FWPDS  against 
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Time  (s) 

Speedup 

Prog 

ICFG  nodes 

#  Procs 

BFS-WPDS++ 

FWPDS-Full 

FWPDS 

uucp 

16973 

139 

4.7 

3.3 

2.9 

1.60 

me 

78641 

676 

5.4 

5.2 

3.1 

1.72 

make 

40667 

204 

15.1 

7.7 

5.8 

2.58 

indent 

28155 

104 

19.6 

28.2 

15.9 

1.24 

less 

33006 

359 

22.4 

8.6 

5.3 

4.19 

patch 

27389 

133 

70.2 

23.2 

17.1 

4.09 

gawk 

86617 

401 

72.7 

64.5 

45.1 

1.61 

wget 

44575 

399 

318.4 

58.9 

27.0 

11.77 

Table  4.2  Comparison  of  BTrace  results.  The  last  column  shows  speedup  of  FWPDS 
over  BFS-WPDS++.  The  critical  nodes  were  chosen  at  random  from  ICFG  nodes  and  the 
failure  site  was  set  as  the  exit  point  of  the  program.  The  programs  are  common  Unix 
utilities,  and  the  experiments  were  run  on  2.4  GHz  P4  machine  with  4GB  RAM. 

this  library  using  a  set  of  programs  (and  an  error  configuration  for  each  program)  supplied 
by  S.  Schwoon.  We  compute  the  set  of  all  variable  valuations  that  can  hold  at  the  error 
configuration  by  computing  its  JOP  weight.  As  shown  in  Tab.  4.3,  FWPDS  is  2  to  5  times 
faster  than  Moped. 

Moped  can  also  be  asked  to  stop  as  soon  as  it  Ends  out  that  the  error  configuration  is 
reachable  (instead  of  exploring  all  paths  that  lead  to  the  error  configuration).  In  that  case, 
when  the  error  configuration  was  reachable,  Moped  performed  much  better  than  FWPDS, 
often  completing  in  less  than  a  second.  This  is  expected  because  the  evaluation  strategy  used 
by  FWPDS  is  oriented  towards  finding  the  complete  weight  (JOD  value)  on  a  transition. 
For  example,  it  might  be  better  to  avoid  saturating  a  loop  completely  and  propagate  par¬ 
tially  computed  weights  in  the  hope  of  finding  out  that  the  error  configuration  is  reachable. 
However,  when  the  error  configuration  is  unreachable,  or  when  the  abstraction-refinement 
mode  in  MOPED  is  turned  on,  it  explores  all  paths  in  the  program  and  computes  the  JOD 
value  of  all  transitions.  In  such  situations,  it  is  likely  to  be  better  to  use  FWPDS. 
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Prog 

Moped 

FWPDS-Full 

FWPDS 

Speedup 

bugs5 

13.11 

13.03 

7.25 

1.81 

slam-fixed 

32.67 

19.23 

13.3 

2.46 

slam 

6.32 

5.21 

3.27 

1.93 

unified- serial 

37.10 

19.65 

12.46 

2.98 

iscsil 

29.15 

27.12 

14.08 

2.07 

iscsilO 

178.22 

59.63 

31.29 

5.70 

Table  4.3  Moped  results.  The  last  column  shows  speedup  of  FWPDS  over  Moped.  The 
programs  were  provided  by  S.  Schwoon,  and  are  not  yet  publically  available. 

Incremental  Analysis 

We  also  measure  the  advantage  of  doing  an  incremental  analysis  for  BTrace.  Similar 
to  the  experiments  performed  in  [22],  we  delete  a  procedure  from  a  program,  solve  GPS, 
then  reinsert  the  procedure  and  look  at  the  time  that  it  takes  to  solve  GPS  incrementally. 
We  compare  this  time  with  the  time  that  it  takes  to  compute  the  solution  from  scratch. 
We  repeated  this  for  all  procedures  in  a  given  program,  and  discarded  those  runs  that  did 
not  affect  at  least  one  other  procedure.  The  results  are  shown  in  Tab.  4.4,  which  shows  an 
average  speed  up  by  a  factor  of  6.5. 


Prog 

Procs 

#  Recomputed 

Incremental  (sec) 

Scratch  (sec) 

Improvement 

less 

359 

91 

1.66 

8.6 

5.18 

me 

676 

70 

0.41 

5.2 

12.68 

uucp 

139 

36 

2.00 

3.3 

1.65 

Table  4.4  Results  for  incremental  analysis  for  BTRACE.  The  third  column  gives  the 
average  number  of  procedures  for  which  the  solution  had  to  be  recomputed.  The  fourth 
and  fifth  columns  report  the  time  taken  by  the  incremental  approach  and  by 
recomputation  from  scratch  (using  FWPDS),  respectively. 
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4.4  Related  Work 

The  basic  strategy  of  using  a  regular  expression  to  describe  a  set  of  paths  has  been 
used  previously  for  dataflow  analysis  of  single- procedure  programs  [91].  The  only  work  that 
we  are  aware  of  that  uses  this  technique  for  multi-procedure  programs  is  by  Ramalingam 
[79].  However,  there  the  regular  expressions  were  used  for  a  particular  analysis  (namely, 
execution-frequency  analysis)  and  the  technique  was  motivated  by  the  special  requirements 
of  execution-frequency  analysis  when  creating  procedure  summaries,  rather  than  efficiency. 

There  has  been  other  work  on  improving  over  the  chaotic-iteration  strategy,  but  these 
have  mostly  been  restricted  to  single-procedure  programs.  The  work  on  node-listing  algo¬ 
rithms  [46]  and  Bourdoncle’s  weak  topological  ordering  (wto)  [13]  assign  a  priority  to  each 
node  of  a  graph  such  that  nodes  with  lower  priority  in  the  worklist  must  be  processed  be¬ 
fore  nodes  with  higher  priority.  In  the  tool  CS/x86,  G.  Balakrishnan  extended  Bourdoncle’s 
technique  to  interprocedural  analysis  using  a  two-part  priority  scheme:  one  part  was  the  wto 
priority;  the  other  part  was  based  on  the  call  graph  [2], 

The  focus  of  our  work  has  been  on  addressing  interprocedural  analysis.  Our  techniques 
apply  to  any  problem  that  can  be  encoded  as  a  WPDS,  and  showed  how  various  enhance¬ 
ments  (incremental  computation  of  regular  expressions,  computing  lazily,  etc.)  contribute 
to  creating  a  faster  analysis.  At  the  intraprocedural  level,  we  chose  to  make  use  of  Tarjan’s 
path-expression  algorithm  instead  of  the  other  techniques  mentioned  above.  This  was  be¬ 
cause  we  were  able  to  leverage  the  compactness  of  the  regular-expression  representation  at 
the  interprocedural  level  as  well  (by  computing  them  incrementally,  lazily,  etc.).  It  would 
be  interesting  to  explore  how  node-listing  algorithms  and  Bourdoncle’s  technique  could  be 
used  interprocedurally. 

There  has  been  a  host  of  previous  work  on  incremental  program  analysis  as  well  as  on 
interprocedural  automata-based  analysis  [22],  The  incremental  algorithm  we  have  presented 
is  similar  to  the  algorithm  in  [22],  but  generalizes  it  to  WPDSs  and  is  thus  applicable  in 
domains  other  than  finite-state  property  verification.  A  key  difference  with  their  algorithm 
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is  that  they  explore  the  property  automaton  on-the-fly  as  the  program  is  explored.  Encoding 
the  property  automaton  into  a  WPDS  (Section  2.4.3)  requires  the  whole  automaton  before 
the  program  is  explored.  While  such  an  encoding  has  the  benefit  of  being  amenable  to 
symbolic  approaches,  it  can  be  disadvantageous  when  the  property  automaton  is  large  but 
only  a  small  part  of  the  property  space  is  relevant  for  the  program. 
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Chapter  5 

Error  Projection 

Abstraction  refinement  has  been  shown  to  be  useful  both  for  finding  bugs  and  for  es¬ 
tablishing  properties  of  programs  (Section  1.1.1).  This  technique  has  been  implemented  in 
a  number  of  verification  tools,  including  SLAM  [4],  BLAST  [37],  and  MAGIC  [18].  In  this 
chapter,  we  show  how  to  improve  the  abstraction-refinement  process  by  making  maximum 
possible  use  of  a  given  abstraction  before  moving  to  more  refined  abstractions. 

We  accomplish  this  by  computing  error  projections  and  annotated  error  projections.  An 
error  projection  is  the  set  of  program  nodes  N  such  that  for  each  node  n  G  N ,  there  exists 
an  error  path  that  starts  from  the  entry  point  of  the  program  and  passes  through  n.  By 
definition,  an  error  projection  describes  all  of  the  nodes  that  are  members  of  paths  that  lead 
to  a  specified  error  in  the  model,  and  no  more.  This  allows  an  automated  verification  tool 
(or  a  human  debugging  code  manually)  to  focus  their  efforts  on  only  the  nodes  in  the  error 
projection:  every  node  not  in  the  error  projection  does  not  contribute  to  the  (apparent)  error 
(with  respect  to  the  property  being  verified).  Tools  such  as  SLAM  only  need  to  refine  the 
part  of  the  program  that  is  inside  the  projection. 

Annotated  error  projections  are  an  extension  of  error  projections.  An  annotated  error 
projection  adds  two  annotations  to  each  node  n  in  the  error  projection:  1)  A  counterexample 
(i.e.,  a  path  that  fails)  that  passes  through  n;  2)  a  set  of  data  values  (memory-configuration 
descriptors)  that  describes  the  conditions  necessary  at  n  for  the  program  to  fail.  The  goal 
is  to  give  back  to  the  user — either  an  automated  tool  or  human  debugger — more  of  the 
information  discovered  during  the  verification  process. 
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From  a  theoretical  standpoint,  an  error  projection  solves  a  combination  of  forward  and 
backward  analyses.  The  forward  analysis  computes  the  set  of  program  states  SVwd  that  are 
reachable  from  program  entry;  the  backward  analysis  computes  the  set  of  states  Sbck  that  can 
reach  an  error  at  certain  pre-specihed  nodes.  Under  a  sound  abstraction  of  the  program,  each 
of  these  sets  provides  a  strong  guarantee:  only  states  in  Sfwd  can  ever  arise  in  the  program, 
and  only  states  in  5bck  can  ever  lead  to  error.  Error  projections  ask  the  natural  question 
of  combining  these  guarantees  to  compute  the  set  of  states  Se rr  =  Sfwd  D  S'bck  containing 
all  states  that  can  both  arise  during  program  execution,  and  lead  to  error.  In  this  sense, 
an  error  projection  is  making  maximum  use  of  the  given  abstraction — by  computing  the 
smallest  envelope  of  states  that  may  contribute  to  program  failure. 

Computation  of  this  intersection  turns  out  to  be  non-trivial  because  the  two  sets  Sfwd 
and  Sbck  may  be  infinite.  In  Section  5.2  and  Section  5.3,  we  show  how  to  compute  this  set 
efficiently  and  precisely  for  WPDSs.  The  techniques  that  we  use  are  general,  and  apart  from 
the  application  of  finding  error  projections,  we  discuss  additional  applications  in  Section  5.5. 

The  contributions  of  the  work  presented  in  this  chapter  can  be  summarized  as  follows: 

•  We  define  the  notions  of  error  projection  and  annotated  error  projection.  These  projec¬ 
tions  divide  the  program  into  a  correct  and  an  incorrect  part  such  that  further  analysis 
need  only  be  carried  out  on  the  incorrect  part. 

•  We  give  a  novel  combination  of  forward  and  backward  analyses  for  multi-procedural 
programs  using  weighted  automata  and  use  it  for  computing  (annotated)  error  projec¬ 
tions  (Section  5.2  and  Section  5.3).  We  also  show  that  our  algorithms  can  be  used  for 
solving  various  other  problems  in  program  verification  (Section  5.5). 

•  Our  experiments  show  that  we  can  efficiently  compute  error  projections  (Section  5.4). 

The  remainder  of  this  chapter  is  organized  as  follows:  Section  5.1  motivates  the  diffi¬ 
culty  in  computing  (annotated)  error  projections  and  illustrates  their  utility.  Section  5.2 
and  Section  5.3  give  the  algorithms  for  computing  error  projections  and  annotated  error 
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projections,  respectively.  Section  5.4  presents  our  initial  experiments.  Section  5.5  covers 
other  applications  of  our  algorithms.  Section  5.6  discusses  related  work. 

5.1  Examples 

Consider  the  program  shown  in  Fig.  5.1.  Here  x  is  a  global  unsigned  integer  variable,  and 
assume  that  procedure  foo  does  not  change  the  value  of  x.  Also  assume  that  the  program 
abstraction  is  a  Boolean  abstraction  in  which  integers  (only  x  in  this  case)  are  modeled  using 
8  bits,  i.e.,  the  value  of  x  can  be  between  0  and  255  with  saturated  arithmetic.  This  type  of 
abstraction  is  used  by  Moped  [85],  and  happens  to  be  a  precise  abstraction  for  this  example. 

(1)  (p,  start)  (p,  rii)  id 

(2)  (p,  m)  ^  (p,  ci)  {(-,5)} 

(3)  (p,  Cl)  ^  (p,  A  r  1>  id 

(4)  (p,  n)  <->  <p,  ra4)  id 

(5)  (p,  n4)  •-»  (p,  n7)  {(M  +  2)} 

(6)  (p,  start)  (p,n 2)  id 

(7)  (p,  n2)  ^  (p,  c2)  {(-,8)} 

(8)  (p,  c2)  ^  <p,  fi  r2)  id 

(9)  (p,  r2>  <->  (p,  n5)  id 

(10)  <p,re5)  <p,n7>  {(*,*  +  3)} 

(11)  (p,  start)  «— >  (p,  ^3)  id 

(12 )  {P,n3)  ^  {p,c3)  {(-,9)} 

(13)  (p,  c3)  (p, /1  r3)  id 

(14)  (p,  r3)  ^  (p,  ra6)  id 

(15)  (p,ns)  <-*  (p, ra7)  {!'.  i  :  1)} 

(16)  (p,  re7)  >-»  (p,  error)  {(10,10)} 

(17)  <p,  /i>  •->  (p,  n)  id 

(18)  (p,  n)  <p,  /2>  id 

(19)  <p,  /2>  (p,e)  id 

(b) 

Figure  5.1  (a)  An  example  program  and  ( b )  its  corresponding  WPDS.  Weights,  shown  in 

the  last  column,  are  explained  in  Section  5.2. 

The  program  has  an  error  if  node  error  is  reached.  The  two  paths  on  the  left  that  set 
the  value  of  x  to  5  or  8  are  correct  paths,  and  the  one  on  the  right,  which  sets  the  value  of  x 
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to  9,  goes  to  error.  The  error  projection  is  shaded  in  the  figure.  An  error  projection  need 
not  be  restricted  to  a  single  trace  (e.g.,  if  foo  had  multiple  paths  then  the  error  projection 
would  include  multiple  traces).  An  annotated  error  projection  will  additionally  tell  us  that 
the  value  of  x  at  node  n  inside  foo  has  to  be  9  on  an  error  path  passing  through  this  node. 
Note  that  the  value  of  x  can  be  5  or  8  on  other  paths  that  pass  through  n,  but  they  do  not 
lead  to  the  error  node. 

It  is  non-trivial  to  deduce  these  facts  about  the  value  of  x  at  node  n.  An  interprocedural 
forward  analysis  starting  from  start  will  show  that  the  value  of  x  at  node  n  is  in  the  set 
{5,  8,  9}.  A  backward  interprocedural  analysis  starting  from  error  concludes  that  the  value 
of  x  at  n  has  to  be  in  the  set  {7,  8,  9}  in  order  to  reach  error.  Intersecting  the  sets  obtained 
from  forward  and  backward  analysis  only  gives  an  over-approximation  of  the  annotated  error 
projection  values.  In  this  case,  the  intersection  is  {8,9},  but  x  can  never  be  8  on  a  path 
leading  to  error.  The  over-approximation  occurs  because,  in  the  forward  analysis,  the  value 
of  x  is  8  only  when  the  call  to  foo  occurs  at  call  site  C2,  but  in  the  backward  analysis,  a  path 
that  reaches  n  with  x  =  8  and  goes  to  error  must  have  had  the  call  to  foo  from  call  site  c\. 
This  mismatch  in  the  calling  context  leads  to  the  observed  over- approximation. 

Such  a  complication  also  occurs  while  computing  non-annotated  error  projections:  to  see 
this,  assume  that  the  edge  leading  to  node  n  is  predicated  by  the  condition  if  (x !  =9) .  Then, 
node  n  can  be  reached  from  start,  and  there  is  a  path  starting  at  n  that  leads  to  error, 
but  both  of  these  cannot  occur  together. 

Formally,  a  node  is  in  the  error  projection  if  and  only  if  the  associated  value  set  computed 
for  the  annotated  projection  is  non-empty.  In  this  sense,  computing  an  error  projection  is  a 
special  case  of  computing  the  annotated  version.  However,  we  still  discuss  error  projections 
separately  because  (i)  computing  them  is  easier,  as  we  see  later  (computing  annotations 
requires  one  extra  trick),  and  (ii)  they  can  very  easily  be  cannibalized  by  existing  tools  such 
as  SLAM  in  their  abstraction-refinement  phase:  when  an  abstraction  needs  to  be  refined, 
only  the  portion  inside  the  error  projection  needs  to  be  rechecked.  We  illustrate  this  point 
in  more  detail  in  the  next  example. 
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numUnits  :  int; 

nUO:  bool; 

nUO:  bool; 

level  :  int; 

void  getUnit()  { 

void  getUnit()  { 

void  getUnit()  { 

void  getUnitQ  { 

[1]  canEnter:  bool  :=  F; 

[1]  • 

[1]  • 

[1] 

cE:  bool  :=  F; 

[2]  if  (numUnits  =  0)  { 

[2]  if  (?)  { 

[2]  if  (nUO)  { 

[2] 

if  (nUO)  { 

[3]  if  (level  >  10)  { 

[3] 

if  (?)  { 

[3] 

if  (?)  { 

[3] 

if  (?)  { 

[4]  NewUnit(); 

[4] 

[4] 

[4] 

[5]  numUnits  :=1; 

[5] 

[5] 

nUO  :  = 

=  F; 

[5] 

nUO  :=  F; 

[6]  canEnter  :=T; 

[6] 

[6] 

[6] 

cE  :  =  T; 

} 

} 

} 

} 

}  else 

}  else 

} 

else 

}  else 

[7]  canEnter  :=T; 

[7] 

[7] 

[7] 

cE  :=T; 

[8]  if  (canEnter) 

[8]  if  (?) 

[8]  if  (?) 

[8] 

if  (cE) 

[9]  if  (numUnits  =  0) 

[9] 

if  (?) 

[9] 

if  (nUO) 

[9] 

if  (nUO) 

[10]  assert(F); 

[10] 

[10] 

[10] 

else 

else 

else 

else 

[11]  gotUnit(); 

[11] 

[11] 

[11] 

P 


Bo 


Bo 


Figure  5.2  An  example  program  P  and  its  abstractions  as  Boolean  programs.  The  “•  •  •” 
represents  a  “skip”  or  a  no-op.  The  part  outside  the  error  projection  is  shaded  in  each  case. 


Fig.  5.2  shows  an  example  program  and  several  abstractions  that  SLAM  might  produce. 
This  example  is  given  in  [7]  to  illustrate  the  SLAM  refinement  process.  SLAM  uses  predicate 
abstraction  to  create  Boolean  programs  as  described  earlier  in  Section  1.1.1.  We  will  show 
the  utility  of  error  projections  for  abstraction  refinement  using  this  example. 

First,  recall  the  SLAM  refinement  process.  In  Fig.  5.2,  the  property  of  interest  is  the 
assertion  on  line  10.  We  want  to  verify  that  line  10  is  never  reached  (“assert(F)”  always 
triggers  an  assertion  violation).  The  first  abstraction  B\  is  created  without  any  predicates. 
It  only  reflects  the  control  structure  of  P.  Reachability  analysis  on  Bi  (assuming  getUnit 
is  program  entry)  shows  that  the  assertion  is  reachable.  This  results  in  a  counterexample, 
whose  subsequent  analysis  reveals  that  the  predicate  {numUnits  =  0}  is  important.  Program 
B2  tracks  that  predicate  using  variable  nUO.  Reachability  analysis  on  B2  reveals  that  the 
assertion  is  still  reachable.  Now  predicate  {canEnter  =  T}  is  added,  to  produce  B3,  which 
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tracks  the  predicate’s  value  using  variable  cE.  Reachability  analysis  on  B3  reveals  that  the 
assertion  is  not  reachable,  hence  it  is  not  reachable  in  P. 

The  advantage  of  using  error  projections  is  that  the  whole  program  need  not  be  abstracted 
when  a  new  predicate  is  added.  Analysis  on  Bi  and  B2  fails  to  prove  that  the  whole  program 
is  correct,  but  error  projections  may  reveal  that  at  least  some  part  of  the  program  is  correct. 
The  parts  outside  the  error  projections  (and  hence  correct)  are  shaded  in  the  figure.  Error 
projection  on  Bx  shows  that  line  11  cannot  contribute  to  the  bug,  and  need  not  be  considered 
further.  Therefore,  when  constructing  B2,  we  need  not  abstract  that  statement  with  the  new 
predicate.  Error  projection  on  B2  further  reveals  that  lines  3  to  6  and  line  7  do  not  contribute 
to  the  bug  (the  empty  else  branch  to  the  conditional  at  line  3  still  can).  Thus,  when  B3  is 
constructed,  this  part  need  not  be  abstracted  with  the  new  predicate.  B3,  with  the  shaded 
region  of  B2  excluded,  reduces  to  a  very  simple  program,  resulting  in  reduced  effort  for  its 
construction  and  analysis. 

Annotated  error  projections  can  further  reduce  the  analysis  cost.  Suppose  there  was  some 
code  between  lines  1  and  2,  possibly  relevant  to  proving  the  program  to  be  correct,  that  does 
not  modify  numUnits.  After  constructing  B2,  the  annotated  error  projection  would  tell  us 
that  in  this  region  of  code,  nUO  can  be  assumed  to  be  true,  because  otherwise  the  assertion 
cannot  be  reached.  This  might  save  half  of  the  theorem  prover  calls  needed  to  abstract  that 
region  of  code  when  using  multiple  predicates  (because  every  predicate  whose  value  is  not 
fixed  doubles  the  cost  of  abstracting  program  statements). 

While  this  example  did  not  require  an  interprocedural  analysis,  placing  any  piece  of  code 
inside  a  procedure  would  necessitate  its  use.  We  show  how  to  compute  error  projections 
when  WPDSs  or  EWPDSs  are  used  as  the  model  of  a  program.  Because  Boolean  programs 
can  be  encoded  using  EWPDSs,  our  techniques  would  be  able  to  End  the  error  projections 
shown  in  Fig.  5.2. 

Standard  interprocedural  analyses  do  not  say  anything  about  calling  contexts  associated 
with  different  reachable  values  of  variables.  As  we  saw  earlier,  a  mismatch  in  the  calling 
context  can  lead  to  an  over-approximation  in  the  error  projection.  Because  of  the  need 
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n  id  b  {(8,10)} 


r3  ((9,10)} 


Figure  5.3  Parts  of  the  poststar  and  prestar  automaton,  respectively. 


for  reasoning  about  the  calling  contexts,  the  automata-based  reachability  algorithms  for 
(E)WPDSs  offer  a  distinct  advantage  over  other  algorithms.  The  next  section  shows  how  to 
combine  the  automata  obtained  from  backward  and  forward  reachability  analysis  on  WPDSs 
to  compute  (annotated)  error  projections.  This  technique  is  generalized  for  EWPDSs  in 
Section  5.2.1. 


5.2  Computing  an  Error  Projection 

As  a  running  example,  we  use  the  WPDS  of  Fig.  5.1(b)  that  models  the  program  shown 
in  Fig.  5.1(a).  This  WPDS  uses  a  relational  weight  domain  over  the  set  V  =  {0, 1,  •  •  •  ,  255}, 
corresponding  to  an  8-bit  encoding  for  the  values  of  variable  x  (as  explained  in  Section 
5.1).  The  weight  {(_,  5)}  is  shorthand  for  the  set  {(*,5)  j  i  G  V };  {(i,i  +  1)}  stands  for 
{(*,*  +  1)  |  i  G  V}  (with  saturated  arithmetic);  and  id  stands  for  the  identity  relation  on  V. 

Let  As  and  At  be  (unweighted)  automata  that  accept  the  sets  S  and  T,  respec¬ 
tively.  Recall  that  IJOP(S',  T)  =  poststar(As)(T )  =  prestar(AT)(S).  For  the  program 
shown  in  Fig.  5.1,  parts  of  the  automata  produced  by  poststar  ({start})  and  prestar(e  rror 
T*)  are  shown  in  Fig.  5.3  (only  the  part  important  for  node  n  is  shown).  Using 
these,  we  get  IJOP({start},  n  T*)  =  {(_,  5),  (_,  8),  (_,  9)}  and  IJOP(n  T*,  error  T*)  = 
{(7, 10),  (8, 10),  (9, 10)}.  Here,  (7  T*)  stands  for  the  set  {7  c  |  c  G  T*}. 

We  now  define  an  error  projection  using  WPDSs  as  onr  model  of  programs.  Usually,  a 
WPDS  created  from  a  program  has  a  single  PDS  state.  Even  when  this  is  not  the  case,  the 
states  can  be  pushed  inside  the  weights  to  get  a  single-state  WPDS.  We  use  this  to  simplify 
the  discussion:  PDS  configurations  are  just  represented  as  stacks  (T*). 
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Also,  we  concern  ourselves  with  assertion  checking.  We  assume  that  we  are  given  a  target 
set  of  control  configurations  T  such  that  the  program  model  exhibits  an  error  only  if  it  can 
reach  a  configuration  in  that  set.  One  way  of  accomplishing  this  is  to  convert  every  asser¬ 
tion  of  the  form  “assert  (£)”  into  a  condition  “if(!£)  then  goto  error”  (assuming  \S  is 
expressible  under  the  current  abstraction),  and  instantiate  T  to  be  the  set  of  configurations 
(error  T*).  We  also  assume  that  the  weight  abstraction  has  been  constructed  such  that  a 
path  a  in  the  PDS  is  infeasible  if  and  only  if  its  weight  v(a)  is  0.  Therefore,  under  this 
model,  the  program  has  an  error  only  when  it  can  reach  a  configuration  in  T  with  a  path  of 
non-0  weight. 

Definition  5.2.1.  Given  S,  the  set  of  starting  configurations  of  the  program,  and  a  target 
set  of  configurations  T ,  a  program  node  7  E  T  is  in  the  error  projection  EP(S,  T)  if 
mid  only  if  there  exists  a  path  a  =  cycy  such  that  v(o)  0  and  s  =^ai  c  t  for  some 
s  e  S,c  e  yT *,t  e  T. 

We  calculate  the  error  projection  by  computing  a  constrained  form  of  the  join-over-all- 
paths  value,  which  we  call  a  weighted  chopping  query. 

Definition  5.2.2.  Given  regular  sets  of  configurations  S  (source),  T  (target),  and  C  (chop); 
a  weighted  chopping  query  is  to  compute  the  following  weight: 

WC(S, C, T )  =  ®{u(upt2)  I  S  ^  t,s  e  s,c  e  C,t  e  T} 

ft  is  easy  to  see  that  7  G  EP(S,  T )  if  and  only  if  WC(S',  7  T*,  T )  fz  (j.  We  now  show  how  to 
solve  these  queries.  First,  note  that  WC(S,  C ,  T)  IJOP(S',  C)(E)IJOP(C',  T ).  For  example, 
in  Fig.  5.1,  if  foo  was  not  called  from  c3,  and  S  =  {start},  T  =  (error  r*),C  =  (n  T*) 
then  IJOP(<S',  C)  =  {(_,  5),  (_,  8)}  and  IJOP((7,  T)  =  {(7, 10),  (8, 10)},  and  their  extend  is 
non-empty,  whereas  WC (S,  C,  T)  =  0.  This  is  exactly  the  problem  mentioned  in  Section  5.1. 

A  first  attempt  at  solving  weighted  chopping  is  to  use  the  identity  WC(S',  C,  T )  = 
®{IJOP(S',  c)  (8)  IJOP(c,  T)  j  c  G  C}.  However,  this  only  works  when  C  is  a  finite  set 
of  configurations,  which  is  not  the  case  if  we  want  to  compute  an  error  projection.  We  can 
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solve  this  problem  using  the  automata-theoretic  constructions  described  in  the  previous  sec¬ 
tion.  Let  As  be  an  unweighted  automaton  that  represents  the  set  S,  and  similarly  for  Ac 
and  At-  The  following  two  algorithms,  given  in  different  columns,  are  valid  ways  of  solving 
a  weighted  chopping  query. 

Algorithm  Doubl e-poststar  Algorithm  Double-prestar 

1.  A\  —  poststar(As)  1.  A\  —  prestar^Ar) 

2.  A2  =  («4i  D  Ac)  2.  A2  =  (Ai  n  Ac) 

3.  A3  =  poststar(A2)  3.  A3  =  prestar{A2) 

4.  A4  =  A3  D  At  4.  A4  =  A3  D  As 

5.  WC(S',  C,  T )  =  path^summary^Af)  5.  WC(<S',  C,  T )  =  pathsummary(A 4) 

Proof.  We  prove  correctness  of  the  doubl  e-poststar  algorithm.  A  proof  for  double-prestar  is 
similar.  From  the  properties  of  poststar  (Lem.  2.4.6),  we  know  that: 


0  if  c  £  C 

Me) 

[  ©{'y(cri)  I  s  c,  s  G  S}  if  c  G  C 
=>  Mt)  =  ©{^(c)  ®v((T2)  !  C  =>ff2  t} 

=  ©{©M^i)  ®  v(a2)  |  s  =^ai  c,s  E  S}  \  c  E  C,c  =>CT2  t} 

=  0{u(cri)  ®  v(a2)  |  s  =>CTl  c  =>a2  t,s  e  S,c  e  C} 

=  ®{v(cri<J2)  I  s  =^ai  c  =>CT2  t,  s  G  S,c  G  C} 
pathsummary^Ai)  =  Az(T) 

=  0{u(<Jicr2)  |  s  =>CT1  c  =>CT2  tjseSjceCjteT} 

=  WC  (S,C,T) 

□ 


The  running  time  of  these  algorithms  is  proportional  to  the  size  of  Ac,  not  the  size  of  C. 

An  error  projection  is  computed  by  solving  a  separate  weighted  chopping  query  for  each 
node  7  in  the  program.  This  means  that  the  source  set  S  and  the  target  set  T  remain  fixed, 
but  the  chop  set  C  keeps  changing.  Unfortunately,  the  two  algorithms  given  above  have  a 
major  shortcoming:  only  their  first  steps  can  be  carried  over  from  one  chopping  query  to  the 
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next;  the  rest  of  the  steps  have  to  be  recomputed  for  each  node  7.  As  shown  in  Section  5.4, 
this  approach  is  very  slow,  and  the  algorithm  discussed  next  is  about  3  orders  of  magnitude 
faster. 

To  derive  a  better  algorithm  for  weighted  chopping  that  is  more  suited  for  computing 
error  projections,  let  us  first  look  at  the  unweighted  case  (i.e.,  the  weighted  case  where  the 
weight  domain  just  contains  the  weights  0  and  1).  Then  WC (S,C,T)  =  1  if  and  only  if 
(post*{S)  fl pre*(T))  fl C  ^  0.  This  procedure  just  requires  a  single  intersection  operation  for 
different  chop  sets.  Computation  of  both  post*{S )  and  pre*{T )  have  to  be  done  just  once. 
We  generalize  this  approach  to  the  weighted  case. 

First,  we  need  to  define  what  we  mean  by  intersecting  weighted  automata.  Let  A\  and 
A-2  be  two  weighted  automata.  Define  their  intersection  Ai  <  Ai  to  be  a  function  from 
configurations  to  weights,  which  we  later  compute  in  the  form  of  a  weighted  automaton, 
such  that  ( A\  <A-2)(c)  =  -4i(c)  (^^(c).1  Define  ( Ai  <Ai){C)  =  ®{(*Ai  <.4.2) (c)  j  c  G  C}, 
as  before.  Based  on  this  definition,  if  Apost*  =  poststar(As)  and  Apre*  =  prestar^Ar),  then 
WC (S,C,T)  =  {Ap0st*  <  Apre*)(C). 

Let  us  give  some  intuition  into  why  intersecting  weighted  automata  is  hard.  For  A\  and 
A2  as  above,  the  intersection  is  defined  to  read  off  the  weight  from  A\  first  and  then  extend 
it  with  the  weight  from  Ai-  A  naive  approach  would  be  to  construct  a  weighted  automaton 
^.12  as  the  concatenation  of  A\  and  Ai  (with  epsilon  transitions  from  the  final  states  of  A\  to 
the  initial  states  of  Ai)  and  let  {A\  <Ai)(c)  =  ,4.12(0  c).  However,  computing  (Ai  <1.4.2)  (C*) 
for  a  regular  set  C  requires  computing  join-over-all-paths  in  An  over  the  set  of  paths  that 
accept  the  language  {(c  c)  |  c  G  C}  because  the  same  path  (i.e.,  c)  must  be  followed  in  both 
,4i  and  Ai-  This  language  is  neither  regular  nor  context-free,  and  we  do  not  know  of  any 
method  that  computes  join-over-all-paths  over  a  non-context-free  set  of  paths. 

The  trick  here  is  to  recognize  that  weighted  automata  have  a  direction  in  which  weights 
are  read  off.  We  need  to  intersect  Apost*  with  Apre* ,  where  Apost *  is  a  backward  automaton 

1Note  that  the  operator  <  is  not  commutative  in  general,  but  we  still  call  it  intersection  because  the 
construction  of  A\  <3  A2  resembles  the  one  for  intersection  of  unweighted  automata. 
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and  Apre*  is  a  forward  automaton.  If  we  concatenate  these  together  but  reverse  the  second 
one  (i.e.,  reverse  all  its  transitions  and  interchange  its  initial  and  final  states),  then  we  get  a 
purely  backward  weighted  automaton  and  we  only  need  to  solve  for  join-over-all-paths  over 
the  language  {(c  cR)  \  c  G  C}  where  cR  is  c  written  in  the  reverse  order.  This  language  can  be 
defined  using  a  linear  context-free  grammar  with  production  rules  of  the  form  “X  — >  7Y7” , 
where  X  and  Y  are  non-terminals.  The  following  section  uses  this  intuition  to  derive  an 
algorithm  for  intersecting  two  weighted  automata. 

Intersecting  Weighted  Automata 

Let  A],  =  (Qb,  T,  — >b,  P,  Fb)  be  a  backward  weighted  automaton  and  Af  = 

,  P,  Ff )  be  a  forward  weighted  automaton.  We  proceed  with  the  standard  automata- 
intersection  algorithm:  Construct  a  new  automaton  Abf  =  (Qb  x  Qf,  T,  — >,  P,  Fb  x  Ff), 
where  we  identify  the  state  (p, p) , p  G  P  with  p,  i.e.,  the  P-states  of  Abf  are  states  of  the 
form  (p,p),p  G  P.  The  transitions  of  this  automaton  are  computed  by  matching  on  stack 
symbols.  If  4  =  (91,  7,  92)  is  a  transition  in  A,  with  weight  Wb  and  tf  =  (q3, 7,  qf)  is  a  transi¬ 
tion  in  Af  with  weight  Wf,  then  add  transition  tbf  =  ((qi,  q3),  7,  ( qi ,  94))  to  Abf  with  weight 
A z.(wb  8  z  8  Wf).  We  call  this  type  of  weight  a  functional  weight  and  use  the  capital  letter 
W  (possibly  subscripted)  to  distinguish  them  from  normal  weights.  Functional  weights  are 
special  functions  on  weights:  given  a  weight  w  and  a  functional  weight  W  =  \z.(w\®z®W2), 
W(w)  =  (■ w\  8)  w  8)  W2).  The  automaton  Abf  is  called  a  functional  automaton. 

We  define  extend  on  functional  weights  as  reversed  function  composition.  That  is,  if 
Wi  =  A z.(w\  8^8  W2)  and  W2  =  \z.(w3  8^8  tu4),  then  ILj  8  IL'2  =  W'2  0  ITj  =  Xz.((w3  8 
wi)  8^8  (W2  8  W4)),  and  is  thus  also  a  functional  weight.  However,  the  combine  operator, 
defined  as  Wi  ©  W2  =  \z.(W\(z)  ©  W2(z)),  does  not  preserve  the  form  of  functional  weights. 
Hence,  functional  weights  do  not  form  a  semiring.  We  now  show  that  this  is  not  a  handicap, 
and  we  can  still  compute  Ab  <  Af  as  required. 

Because  Abf  is  obtained  from  an  intersection  operation,  every  path  in  it  that  is  of  the 
form  (gi,g2)  ~^*  (93,94)  is  in  one-to-one  correspondence  with  paths  q\  -A*  q3  in  A],  and 
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q-2  q±  in  Af.  Using  this  fact,  we  get  that  the  weight  of  a  path  in  Abf  will  be  a  function 
of  the  form  A  z.(wb®  z®Wf),  where  Wb  and  Wf  are  the  weights  of  the  corresponding  paths  in 
Ab  and  Af,  respectively.  In  this  sense,  Abf  is  constructed  based  on  the  intuition  given  in  the 
previous  section:  the  functional  weights  resemble  grammar  productions  UX  — >  7 Y 7”  for  the 
language  {(c  cR)}  with  weights  replacing  the  two  occurrences  of  7,  and  their  composition 
resembles  the  derivation  of  a  string  in  the  language.  (Note  that  in  “ X  — >  7U7”,  the  first  7 
is  a  letter  in  c,  whereas  the  second  7  is  a  letter  in  cR.  In  general,  the  letters  will  be  given 
different  weights  in  At,  and  Af.) 

Formally,  for  a  configuration  c  and  a  weighted  automaton  A,  define  the  predicate 
accPath(A,  c,  w )  to  be  true  if  there  is  an  accepting  path  in  A  for  c  that  has  weight  w, 
and  false  otherwise  (note  that  we  only  need  the  extend  operation  to  compute  the  weight  of 
a  path).  Similarly,  accPath(A,  C,  w)  is  true  iff  accPath(A,  c,  w)  is  true  for  some  c  G  C.  Then 
we  have: 


(Ab  <  Af)(c) 


Ab(c )  ®  Af(c) 

0{wb®w/  |  accPath(Ab,c,Wb),  accPath(Af,c,Wf)} 

0{w7  (8)  Wf  j  accPath(Abf,  c,  A  z.(wb  ®  z®  Wf))} 

Q){\z.(wb  <8)  z  <8)  Wf)( T)  |  accPath(Abf,  c,  A z.(wb  ®  z  ®  Wf))} 
0{VF(T)  |  accPath(Abf,c,W)} 


Similarly,  we  have  (Ab  <  Af)(C)  =  0{W(1)  |  accPath(Abf,C,W)}  =  0{IU(1)  | 
accPath(Abf  fl  Ac,  T*,  W)},  where  Ac  is  an  unweighted  automaton  that  accepts  the  set 
C ,  and  this  can  be  obtained  using  a  procedure  similar  to  path-summary.  The  advantage  of 
the  way  we  have  defined  Abf  is  that  we  can  intersect  it  with  Ac  (via  ordinary  intersection) 
and  then  run  path-summary  over  it,  as  we  show  next. 

Functional  weights  distribute  over  (ordinary)  weights,  i.e.,  W(wi®w2)  =  W(w\)®W(w2)- 
Thus,  pathsummary(Abf)  can  be  obtained  merely  by  solving  an  intraprocedural  join-over- 
all-paths  over  distributive  transformers  starting  with  the  weight  1,  which  is  completely  stan¬ 
dard:  Initialize  l(q )  =  1  for  initial  states,  and  set  l(q)  =  0  for  other  states.  Then,  until  a 
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Figure  5.4  Functional  automaton  obtained  after  intersecting  the  automata  of  Fig.  5.3. 

fixpoint  is  reached,  for  a  transition  (q,  7,  q ')  with  weight  W,  update  the  weight  on  state  q'  by 
l(q')  :=  l(q')  ©  W{l(q)).  Then  path_summary(Abf )  is  the  combine  of  the  weights  on  the  final 
states.  Termination  is  guaranteed  because  we  still  have  weights  associated  with  states,  and 
functional  weights  are  monotonic.  Because  of  the  properties  satisfied  by  Abf,  we  use  At,/  as 
a  representation  for  (Ab  <  Af ). 

This  allows  us  to  solve  WC (S',  C,  T)  =  ( Apost *  <  Apre*)(C).  That  is,  after  a  preparation 
step  to  create  ( Apost *  <.Apre*)>  one  can  solve  WC  (S',  C,  T )  for  different  chop  sets  C  just  using 
intersection  with  Ac  followed  by  path_summary ,  as  shown  above.  Fig.  5.4  shows  an  example. 
For  short,  the  weight  \z.{w\  ©  z  ©  w2)  is  denoted  by  [uy.z.uy]-  Note  how  the  weights  for 
different  call  sites  get  appropriately  paired  in  the  functional  automaton. 

It  should  be  noted  that  this  technique  applies  only  to  the  intersection  of  a  forward 
weighted  automaton  with  a  backward  one,  because  in  this  case  we  are  able  to  get  around  the 
problem  of  computing  join-over-all-paths  over  a  non-context- free  set  of  paths.  Algorithms 
for  intersecting  two  forward  or  two  backward  automata  will  be  discussed  in  Section  6.5.1. 

5.2.1  Computing  Error  Projections  for  EWPDSs 

Computing  error  projections  for  EWPDSs  is  slightly  harder  than  for  WPDSs  for  the 
following  reason:  for  a  rule  sequence  cr  =  <ji<72,  v(a)  7^  v(ai)  ®  v(a2)  because  an  unbalanced 
call  at  the  end  of  oy  may  match  with  an  unbalanced  return  in  the  beginning  of  a 2,  in 
which  case,  a  merge  function  has  to  be  applied.  Thus,  WC(S',{c},T)  7^  IJOP(S',  {c})  © 
IJOP({c},  T). 

Our  solution  for  computing  an  error  projection  for  EWPDSs  is  also  based  on  “intersect¬ 
ing”  the  weighted  automata  Apost*  =  poststar(S)  and  Apre *  =  prestar{T).  However,  it  turns 
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out  that,  in  general,  Apost*  does  not  retain  enough  information  about  the  unbalanced  calls. 
We  restrict  the  set  S  to  be  just  the  starting  configuration  of  the  program,  i.e.,  S  =  {{p,  emain)} 
(or  any  singleton  set  with  a  configuration  with  one  stack  symbol). 

The  intersection  operation  for  weighted  automata  is  carried  out  in  the  same  manner  as 
for  WPDSs,  but  instead  of  always  creating  functional  weights  of  the  form  Xz.(wi  8>  z  8>  w2 ), 
we  may  also  create  weights  of  the  form  \z.(m(wi,  z)  8)  w2)  as  well,  where  m  is  a  merge 
function.  We  explain  our  strategy  through  an  example,  and  then  give  the  algorithm. 


(1)  (p,  start)  ^  (p,  ci)  wi 

(2)  (p,  start)  >->  (p,  c2)  w2 

(3)  (p,  start)  ^  (p,  c3)  w3 

(4)  (p,  ci)  <^->  (p,  /1  n)  w4 

(5)  (p,  c2)  ^  (p,  fi  t 2)  u?5 

(6)  (p,  c3)  (p,  /1  r 3)  w6 

(7)  (p,  n)  ^  (p,  error)  tr7 

(8)  (p,  r2)  ^  (p,  error)  w8 

(9)  (p,  r 3)  <^->  (p,  error)  w9 

(10)  (p,/i)  ^  (p,n)  wio 

(11)  (p,n)  (p,/2)  wn 

(12)  (p,/2)  ^  (p,e)  wi2 


(a) 

Figure  5.5  (a)  A  WPDS.  (b),(c)  Parts  of  the  poststar  and  prestar  automata,  respectively. 

(d)  Functional  automaton  obtained  after  intersecting  the  automata  shown  in  (b)  and  (c). 

(e)  Functional  automaton  for  an  EWPDS  when  the  call  rule  at  ct  is  associated  with  merge 

function  m, . 
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Consider  the  example  shown  in  Fig.  5.5,  which  is  a  reworking  of  the  example  shown  in 
Fig.  5.1.  Let  S  —  {(p,  start)}  and  T  =  {(p,  error)}.  The  weights  have  been  left  unspecified 
so  that  we  can  track  their  contribution  to  the  weights  in  the  functional  automaton  that  is 
shown  in  Fig.  5.5(d).  Let  us  call  this  automaton  *4.func. 

The  functional  weight  A{unc((p,n  r i))  applied  to  1  equals  (rcigrc4<g)Wio&uqigrci2g|W7) 
(which  equals  WC(S,  {(p,  n  ri)},T)).  The  weight  on  the  transition  with  stack  symbol  n 
summarizes  the  set  of  all  paths  from  f\  to  n  (weight  uqo)  and  n  to  the  exit  of  the  procedure, 
including  the  return  rule  (weight  w\\  guq2).  Putting  these  together,  the  weight  (wiogwii  g 
w  12)  is  the  summary  of  the  procedure  starting  at  f\.  Similarly,  the  two  components  of  the 
functional  weight  on  the  transition  for  rq  are  the  weights  of  the  paths  from  start  to  Ci, 
including  the  call  (weight  uq  g  w4),  and  from  rq  to  error  (weight  w7),  respectively.  This 
functional  weight  summarizes  paths  from  start  to  error  with  a  hole,  which  is  filled  by  the 
variable  z,  and  represents  the  summary  of  a  called  procedure.  Thus,  the  functional  weight 
on  the  transition  for  rq  must  apply  a  merge  function. 

Let  We  be  an  EWPDS  obtained  by  associating  merge  function  rrit  with  call  rule 
rule  (p,  Ci)  »  (p, /1  rq)  in  the  WPDS  shown  in  Fig.  5.5(a).  For  We,  the  weight 
WC (S,{(p,n  rq)},T )  is  mi(uq,uq0  g  Wu  g  -uq2)  g  w 7.  The  functional  automaton  shown 
in  Fig.  5.5(e)  computes  exactly  this  weight  for  (p,  n  rq).  Next,  we  outline  the  algorithm  for 
constructing  this  automaton. 

Let  Apoststar  =  poststar(S)  and  Apre*  =  prestar{T).  The  set  of  states  of  Apre*  is  (P  U  Qt), 
where  Qt  is  the  set  of  states  of  the  automaton  that  represents  T.  The  set  of  states  of  Apost* 
is  (P  U  Q mid  U  Qs),  where  Qs  is  the  set  of  states  of  the  automaton  that  represents  S.  To 
distinguish  the  two  occurrences  of  P  in  these  sets,  we  label  the  former  as  Pt  and  the  latter 
as  P5.  To  simplify  the  discussion,  we  assume  that  the  weight  on  a  call  rule  is  always  1. 
(The  construction  given  in  Section  3.7  shows  how  one  can  convert  an  EWPDS  that  does  not 
satisfy  this  restriction  into  one  that  does  satisfy  it.)  The  functional  automaton  is  constructed 
as  before,  except  for  the  weights  on  transitions.  For  each  transition  t\  =  (<71, 7,(72)  of  Apost* 
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with  weight  w\  and  transition  t2  =  (93 ,7,94)  of  ^4pre*  with  weight  w2,  add  a  transition  to 
(-Tpost*  <  Apre*)  as  follows: 

1.  If  h  G  (Qmid  x  r  x  Qmid)  or  ti  G  (Qm\d  x  r  x  Qs)  then  let  q\  =  p'y  and  r  = 
lookup  PushRule(p' ,  y'y).  Add  transition  ((91,  93),  7,  (92,94))  to  the  functional  automa¬ 
ton  with  weight  Xz.(mr(wi,  z )  <g)  w2). 

2.  In  all  other  cases,  add  transition  ((91,  93),  7,  (92,94))  to  the  functional  automaton  with 
weight  Xz.(wi  (g>  z  <E>  w2). 

The  reader  can  verify  that  this  algorithm  will  produce  the  automaton  shown  in  Fig.  5.5(e), 
after  removing  the  states  that  cannot  be  reached  from  the  initial  state,  or  cannot  reach  a 
final  state. 

A  justification  of  this  algorithm  is  based  on  the  types  of  rule  sequences  captured  by  a 
transition  in  Apost*  and  Apre*,  which  are  given  in  Section  3.7  (Figs.  3.8  and  3.11).  We  recall 
the  results  of  that  section  and  show  them  pictorially  in  Fig.  5.6(a)  and  (b).  (They  have  been 
simplified  using  the  restriction  imposed  on  S.)  Fig.  5.6(a)  can  be  read  as  follows:  the  weight 
on  a  transition  t  G  (Qmid  x  V  x  Qmid)  summarizes  the  weights  of  rule  sequences  derivable 
from  (oyi?2),  i.e.,  ones  that  have  a  balanced  sequence  followed  by  a  call  rule;  a  transition 
t  G  Ps  x  T  x  Qs  summarizes  rule  sequences  derivable  from  oy;  and  so  on.  Similarly  for 
Fig.  5.6(b).  Taking  the  intersection  of  these  automata,  one  gets  the  automaton  shown  in 
Fig.  5.6(c).  Let  us  call  this  automaton  Ae. 

The  initial  states  of  Ae  are  Ps  x  Pt  and  the  final  states  belong  to  Qs  x  Qt ■  The  states 
Ps  x  Qt  cannot  be  reached  from  the  initial  states,  and  Qs  x  Pt  cannot  reach  a  final  state. 
As  a  consequence,  these  states,  and  their  transitions,  have  not  been  shown  in  the  figure. 

We  will  show  that  a  merge  function  needs  to  be  applied  if  and  only  if  a  transition  starting 
from  a  state  in  Qmid  x  Pt  is  taken.  (Because  these  are  exactly  the  transitions  created  in 
item  I  of  the  algorithm  outlined  above,  proving  this  claim  shows  that  our  construction  is 
correct.)  Intuitively,  our  claim  holds  because  a  merge  function  is  applied  during  poststar 
when  a  state  is  in  Qmid  (last  case  of  Fig.  3.4)  and  is  applied  during  prestar  when  a  state  is 
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Figure  5.6  (a)  Rule  sequences  for  Apost*.  (b)  Rule  sequences  for  Apre*.  (c)  Rule  sequences 

for  the  functional  automaton  Apost*  <  Apre* . 

in  P  (last  case  of  Fig.  3.3).  Thus,  in  the  functional  automaton,  both  conditions  have  to  be 
satisfied  (i.e.,  a  state  must  lie  in  Qm\&  x  Pt )  for  a  merge  function  to  be  applied. 

The  annotations  on  the  transitions  of  Ae  are  like  functional  weights.  Every  path  in  Ae 
can  be  associated  with  a  rule  sequence  that  the  path  represents,  in  a  manner  similar  to  the 
way  one  calculates  the  weight  of  a  path  in  a  functional  automaton.  However,  instead  of 
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starting  with  weight  1  (which  is  done  for  reading  weights  out  of  a  functional  automaton), 
one  starts  with  the  empty  rule  sequence  e.  For  instance,  if  one  takes  a  transition  t\  from  a 
state  in  P$  x  PT  to  a  state  in  Qm\d  x  PT  and  then  t2  to  another  state  in  Qmid  x  Pt,  the  net 
rule  sequence  for  (tit2)  is  the  following: 


(A z.(abR2  z  abR0 )  (A z.(ab  z  abR0 )  e)) 

=  (Az.(abR2  z  crbR0)  ( ababR0 )) 

=  (Xz.(abR2  z  abR0)  (crfti?0)) 

=  {abR2{(jbRd)(jbRQ) 

=  (&bRo) 

Here  we  have  simplified  expressions  using  the  grammar  of  Fig.  3.2:  we  replace  ( ab  crb) 
with  ab,  and  ( R2  ab  R0 )  with  ab ,  because  each  denote  a  balanced  sequence. 

One  can  prove  that  for  all  paths  in  Ae  that  start  from  an  initial  state  and  end  in  a  state 
in  Qmid  x  Pt,  the  rule  sequence  of  that  path  is  (crb  R0),  i.e.,  it  has  one  unbalanced  return  in 
the  end.  When  one  takes  any  transition  starting  in  Qmid  x  Pt,  the  rule  sequence  becomes 
(Az.(abR2  z  crbR0 )  (ab  R0 ))  or  (A z.((JbR2  z  (R^b)*)  (^b  Ro ))•  hi  each  case,  the  leftmost 
R2  of  the  functional  weight  matches  with  the  R0  of  the  incoming  rule  sequence,  and  a  merge 
function  needs  to  be  applied. 

5.3  Computing  an  Annotated  Error  Projection 

An  annotated  error  projection  adds  more  information  to  an  error  projection  by  associating 
each  node  in  the  error  projection  with  (i)  at  least  one  counterexample  that  goes  through 
that  node  and  ( ii )  the  set  of  data  values  that  may  arise  on  a  path  doomed  to  fail  in  the 
future. 

5.3.1  Computing  Witnesses 

Given  source  set  S  and  target  set  T,  previous  work  on  WPDSs  allows  the  computation 
of  a  finite  set  of  paths,  called  witnesses,  {a*  |  1  <  i  <  n}  such  that  ©i{n((Tj)}  =  IJOP(S',  T) 
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[83].  The  same  result  holds  for  pathsummary  on  weighted  as  well  as  functional  automata: 
we  can  find  a  finite  set  of  paths  in  the  automaton  that  justifies  the  weight  returned  by 
pathsummary  (we  say  that  a  set  of  paths  justifies  a  weight  w  if  the  combine  of  their  weights 
is  equal  to  w).  We  make  use  of  this  technology  in  this  section. 

Suppose  we  find  that  7  is  in  the  error  projection.  Then,  we  know  that  WC(S',  C,  T )  7^  0, 
where  C  =  yT*.  We  will  find  a  path  from  some  configuration  s  G  S'  to  some  configuration 
t  G  T  that  goes  through  some  c  G  C,  with  non-0  weight,  in  two  stages.  In  the  first  stage,  we 
find  c.  In  the  second  stage,  we  find  the  path  through  c. 

Let  Ac  be  the  unweighted  automaton  that  accepts  the  language  C,  and  An  =  ( Apost *  < 
Apre*)C\Ac ■  Then  pathsummary(An )  =  WC(S',  C,  T )  7^  0.  Using  witness  generation,  we  can 
find  at  least  one  path  in  An  whose  weight  is  not  0.  A  path  in  this  automaton  corresponds 
to  a  configuration  c  with  An(c)  7^  0.  This,  in  turn,  implies  that  c  G  C  and  there  is  a  path  in 
the  WPDS  from  S  to  T  through  c  with  non-0  weight. 

Again,  using  standard  witness  generation,  we  can  find  a  set  of  witness  {ui}1<i<n  for 
ApOSt*(c )  =  IJOP(A,  c)  and  a  set  of  witnesses  {pj}\<j<rn  for  Apre*(c )  =  IJOP(c,  T),  respec¬ 
tively.  The  concatenation  of  these  witnesses  .justifies  IJOP(A,  c)  <g)  IJOP(c,  T). 

(The  concatenation  is  a  constant-time  operation  because  a  witness  set  is  represented  using 
a  DAG.)  Therefore,  one  of  these  witnesses  is  a  path  with  non-0  weight  and  serves  as  the 
desired  witness  for  node  7.  The  same  procedure  can  be  repeated  for  each  node  in  the  error 
projection.  Finding  witnesses  is  not  a  very  expensive  operation,  but  it  adds  a  fair  amount  of 
overhead  to  the  execution  of  poststar  and  prestar  (although  their  worst-case  running  times 
do  not  change). 

One  optimization  that  witnesses  allow  is  that  if  we  obtain  a  as  a  witness  for  a  node  7 
in  the  error  projection,  then  for  every  node  7'  such  that  a  configuration  c  G  7T*  occurs  in 
cr,  7'  must  also  be  in  the  error  projection.  Therefore,  while  computing  an  error  projection, 
if  we  find  7  to  be  in  the  error  projection,  then  we  can  find  a  witness  for  it  and  immediately 
include  in  the  error  projection  every  such  7'. 
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5.3.2  Computing  Data  Values 

In  this  section,  we  discuss  algorithms  for  computing  the  data  values  for  nodes  in  an  error 
projection.  The  technique  that  we  present  applies  to  relational  weight  domains  (Defn.  2.4.13). 
Note  that  the  value  of  WC (S,C,T)  does  not  say  anything  about  the  required  set  of  values 
at  C :  for  Fig.  5.1,  WC(S,n  T*,T)  =  {(_,  10)}  but  the  required  memory  configuration  at  n  is 
{9}. 

Let  V  be  a  finite  set  of  memory  configurations,  i.e.,  an  element  of  V  abstracts  a  collection 
of  valuations  of  program  variables.  In  terms  of  dataflow  analysis,  V  is  a  set  of  dataflow 
facts.  In  terms  of  Boolean  programs,  V  is  the  set  of  valuations  of  Boolean  variables.  If  the 
Boolean  program  is  obtained  using  predicate  abstraction,  an  element  of  V  is  a  valuation  v 
of  all  predicates,  which  represents  an  abstraction  for  all  program  states  that  satisfy  those 
predicates  or  their  negated  form,  according  to  v  (see  Section  1.1.2). 

Let  (.D,  ©,  <S>,  0, 1)  be  the  relational  weight  domain  on  V.  For  weights  w,w1,w2  G  D , 
define  Rng(w)  to  be  the  range  of  w,  Dom(tc)  to  be  the  domain  of  w  and  Com(w1,  w2)  = 
Rng(uq)  D  Don^uq).  For  a  node  7  G  EP (S,T),  we  compute  the  following  subset  of  V : 
Vy  =  {v  G  Com(u((Ti), i>(cr2))  |  s  =^°'1  c  =^CT2  t,s  G  S,c  G  ^T*,t  G  T}.  If  v  G  V },  then 
there  must  be  a  path  in  the  program  model  that  leads  to  an  error  such  that  the  abstract 
store  v  arises  at  node  7. 

5. 3. 2.1  An  Explicit  Algorithm 

First,  we  show  how  to  check  for  membership  in  the  set  V1.  Conceptually,  we  place  a 
bottleneck  at  node  7,  using  a  special  weight,  to  see  if  there  is  a  feasible  path  that  can 
pass  through  the  bottleneck  at  7  with  abstract  store  v,  and  then  continue  on  to  the  error 
configuration.  Let  wv  =  {(u,u)}.  Note  that  v  G  Com(w1,w2)  iff  uq  (8)  wv  (8)  w2  7^  0.  Let 
•Apost*  =  poststar(As) ,  Apre*  =  prestar^Ar)  and  be  their  intersection.  Then  v  G  V1  iff 
there  is  a  configuration  c  G  7T*  such  that  IJOP(Sl,  c)  <8>u7<8>IJOP(c,  T)  ^  0  or,  equivalently, 
Apost*(c )  (8)  wv  (8)  Apre*(c )  7^  0.  To  check  this,  we  use  the  functional  automaton  A <,  again.  It 
is  not  hard  to  check  that  the  following  holds  for  any  weight  w: 
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Ap0st*(c )  (8)  w  8  Apre*(c )  =  ©{IF©)  j  accPath(A<i,  c,  W)} 

Then  v  E  V-t  iff  ©{W©„)  |  accPath^A^,  yT*,  IF)}  ^  0.  Again,  this  is  computable  using 
path_summary :  Intersect  A <  with  an  unweighted  automaton  that  accepts  yT*,  then  run 
path_summary ,  but  initialize  the  weight  on  the  initial  states  of  A <  with  wv  instead  of  1. 

This  gives  us  an  algorithm  for  computing  VI,,  but  its  running  time  is  proportional  to  \V\, 
which  might  be  very  large.  In  the  case  of  predicate  abstraction,  \V\  is  exponential  in  the 
number  of  predicates,  but  the  weights  (transformers)  can  be  efficiently  encoded  using  BDDs. 
For  example,  the  identity  transformer  on  V  can  be  encoded  with  a  BDD  of  size  log  \V\.  To 
avoid  losing  the  advantages  of  using  BDDs,  we  now  present  a  symbolic  algorithm. 

5. 3. 2. 2  A  Symbolic  Algorithm 

Let  Y  =  {yv  \  v  E  V}  be  a  fresh  set  of  variables.  We  switch  our  weight  domain  from 
being  VxVtoVxYxV.  We  write  weights  in  the  new  domain  with  superscript  e. 
Intuitively,  the  triple  (vi,y,  v2)  denotes  the  transformation  of  Vi  to  v2  provided  the  variable 
y  is  “true” .  Combine  is  still  defined  to  be  union  and  extend  is  defined  as  follows:  w\®w^  = 
{(vi,y,v2)  \  (vi,y,v3)  E  w{,  (v3,y,v2)  E  wQ.  Also,  V  =  {(v,y,v)  \  v  e  V,y  E  Y }  and 
O'  =  0.  Define  a  symbolic  identity  id®  as  {(v,yv,v)  \  v  E  V}.  Let  Var(we)  =  {v  \  (■ Vi,yv,v2 )  G 
we  for  some  V\,v2  6  f},  i.e.,  the  set  of  values  whose  corresponding  variable  appears  in  we. 
Given  a  weight  in  V  xV,  define  ext(w)  =  {(vi,y,v2)  \  (vi,v2)  E  w,y  E  Y},  i.e.,  all  variables 
are  added  to  the  middle  dimension.  Note  that  le  =  ext(l).  We  will  use  the  middle  dimension 
to  remember  the  “history”  when  composition  is  performed:  for  weights  Wi,w2  E  V  x  V,  it 
is  easy  to  prove  that  Com(wi,  w2)  =  Var(ex t(wi)  8)  id®  8  ext(ta2)).  Therefore,  V1  =  Var(vf;f) 
where,  w®  =  ©{ext(u(cri))  8  ides  8  ext(u(cr2))  |  s  c  =7CT2  t,  s  E  S,c  E  7T*,f  E  T}.  This 
weight  is  computed  by  replacing  all  weights  w  in  the  functional  automaton  with  ext©)  and 
running  path.summary  over  paths  accepting  yT*,  and  initializing  initial  states  with  weight 
id®.  The  advantages  of  this  algorithm  are:  the  weight  ext©)  can  be  represented  using  the 
same-sized  BDD  as  the  one  for  w  (the  middle  dimension  is  “don’t-care”);  and  the  weight  id® 
can  be  represented  using  a  BDD  of  size  0(log  |V|). 
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For  our  example,  the  weight  w ®  read  off  from  the  functional  automaton  shown  in  Fig.  5.4 
is  {(-,1/9, 10)},  which  gives  us  Vn  =  {9},  as  desired. 

5.4  Experiments 

We  carried  out  experiments  to  measure  two  aspects  of  using  error  projections.  First,  we 
measured  the  efficiency  of  our  error-projection  algorithm.  To  put  the  numbers  in  perspective, 
we  compared  the  time  taken  to  compute  an  error  projection  against  the  time  taken  by  a 
reachability  query,  which  provides  a  measure  of  the  amount  of  overhead  that  the  error- 
projection  computation  can  add  to  an  abstraction-refinement  loop.  Second,  we  measured 
the  sizes  of  the  computed  error  projections.  An  error-projection  size  of  50%  implies,  roughly, 
a  2x  speedup  for  all  further  rounds  of  refinement,  because  half  of  the  program  was  proved 
correct  and  would  not  be  considered  subsequently.  Our  results  were  encouraging  in  both 
respects. 

We  added  the  error-projection  algorithm  to  Moped  [85],  a  model  checker  for  Boolean 
programs.  We  changed  the  implementation  of  Moped  so  that  it  first  encodes  a  Boolean 
program  as  an  EWPDS,  and  then  uses  reachability  queries  to  check  assertions  in  the  program. 

We  measured  the  time  needed  to  solve  WC(S,nT*,T)  for  all  program  nodes  n  using 
the  algorithms  from  Section  5.2:  one  that  uses  functional  automata  and  the  double-prestar 
algorithm.  Although  we  report  the  size  of  the  error  projection,  we  could  not  validate  how 
useful  it  was  because  only  the  model  (and  not  the  source  code)  was  available  to  us. 

The  results  are  shown  in  Tab.  5.1.  The  table  can  be  read  as  follows:  the  first  two 
columns  give  the  program  names,  and  the  number  of  nodes  in  the  program.  The  Boolean 
programs  were  provided  to  us  by  S.  Schwoon.  They  were  created  by  SLAM  as  a  result  of 
performing  predicate  abstraction  on  real  driver  source  code,  but  the  original  source  code  was 
not  available  to  us. 

The  next  three  columns  give  the  error-projection  size  relative  to  program  size,  and  times 
to  compute  poststar(S)  and  prestar(T),  respectively.  Columns  six  and  seven  give  the  running 
time  for  solving  WC(S,nT* ,T)  for  all  nodes  n  using  functionals  and  using  double-prestar, 
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respectively,  after  the  initial  computation  of  poststar(S)  and  prestar{T )  was  completed,  i.e., 
the  time  reported  for  functionals  is  the  time  taken  to  intersect  Apost *  arid  Apre*  and  read 
off  values  from  it;  and  the  time  reported  for  double-prestar  is  the  time  taken  by  lines  2- 
5  of  the  algorithm.  Because  the  double-prestar  method  is  so  slow,  we  did  not  run  these 
examples  to  completion;  instead,  we  report  the  time  for  solving  the  weighted  chop  query 
for  only  1%  of  the  blocks  and  multiply  the  resulting  number  by  100.  Column  eight  shows 
the  ratio  of  the  running  time  for  using  functionals  (column  six)  against  the  time  taken  to 
compute  post*(S )  +  pre*(T)  (column  four  +  column  five).  The  last  column  shows  the  ratio 
of  the  running  time  for  the  entire  functional  computation  (column  four  +  column  five  + 
column  six)  against  the  entire  double-prestar  computation  (column  five  +  column  seven). 
All  running  times  are  in  seconds.  The  experiments  were  run  on  a  3GHz  P4  machine  with 
2GB  RAM. 


WC(S>r*,T) 

Functional  vs. 

Prog 

Nodes 

Error  Proj. 

post*(S ) 

(sec) 

pre*(T) 

(sec) 

Functional 

(sec) 

Double-pre* 

(sec) 

Reach 

(sec) 

Double-pre* 

(sec) 

iscsiprtl6 

4884 

0% 

79 

1.8 

3.5 

5800 

0.04 

69 

pnpmem2 

4813 

0% 

7 

4.1 

00 

oo 

16000 

0.79 

804 

iscsiprtlO 

4824 

46% 

0.28 

0.36 

1.6 

1200 

2.5 

536 

pnpmeml 

4804 

65% 

7.2 

4.5 

9.2 

17000 

0.79 

814 

iscsil 

6358 

84% 

53 

110 

140 

750000 

0.88 

2476 

bugs5 

36972 

99% 

13 

2 

170 

85000 

11.3 

459 

Table  5.1  Moped  results:  The  Boolean  programs  were  provided  by  S.  Schwoon.  S  is  the 
entry  point  of  the  program,  and  T  is  the  error  configuration  set.  An  error  projection  of  size 

0%  means  that  the  program  is  correct. 


Discussion 

As  can  be  seen  from  the  table,  using  functionals  is  about  three  orders  of  magnitude  faster 
than  using  the  double-pre*  method.  Also,  as  shown  in  column  eight,  computation  of  the  error 
projection  compares  fairly  well  with  running  a  single  forward  or  backward  analysis  (at  least 
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for  the  smaller  programs).  To  some  extent,  this  implies  that  error-projection  computation 
can  be  incorporated  into  model  checkers  without  adding  significant  overhead. 

The  sizes  of  the  error  projections  indicate  that  they  might  be  useful  in  model  checkers. 
Simple  slicing,  which  only  deals  with  the  control  structure  of  the  program  (and  no  weights) 
produced  more  than  99%  of  the  program  in  each  case,  even  when  the  program  was  correct. 

The  result  for  the  last  program  bugs5,  however,  does  not  seem  as  encouraging  due  to 
the  large  size  of  the  error  projection.  We  do  not  have  the  source  code  for  this  program,  but 
investigating  the  model  reveals  that  there  is  a  loop  that  calls  several  procedures  that  contain 
most  of  the  code,  and  the  error  can  occur  inside  the  loop.  If  the  loop  resets  its  state  when 
looping  back,  the  error  projection  would  include  everything  inside  the  loop  or  called  from 
it.  This  is  because  for  every  node,  there  is  a  path  from  the  loop  head  that  goes  through  the 
node,  then  loops  back  to  the  head,  with  the  same  data  state,  and  then  goes  to  error. 

This  seems  to  be  a  limitation  of  error  projections  and  perhaps  calls  for  similar  techniques 
that  only  focus  on  acyclic  paths  (paths  that  do  not  repeat  a  program  state).  However,  for 
use  inside  a  refinement  process,  error  projections  still  give  the  minimal  set  of  nodes  that 
is  sound  with  respect  to  the  property  being  verified  (focusing  on  acyclic  paths  need  not  be 
sound,  i.e.,  the  actual  path  that  leads  to  error  might  actually  be  cyclic  in  an  abstract  model). 

5.5  Additional  Applications 

The  techniques  presented  in  Section  5.2  and  Section  5.3  give  rise  to  several  other  appli¬ 
cations  of  our  ideas.  In  each  case,  we  run  one  poststar  query  and  one  prestar  query  to  obtain 
automata  Ab  and  Af,  respectively,  and  then  create  Abf  =  (. Ab  <  Af ).  Let  BW(wbot,7)  he 
the  weight  obtained  from  the  functional  automaton  Abf  intersected  with  (7  T*)  and  bottle¬ 
neck  weight  Wbot-  (The  bottleneck  weight  used  in  Section  5. 3. 2.1  was  wv  and  the  one  used 
in  Section  5. 3. 2. 2  was  id®,  respectively.)  This  weight  can  be  computed  for  all  nodes  7  in 
roughly  the  same  time  as  the  error  projection  (which  computes  BW(1,7)). 
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Multi-threaded  Programs 

KISS  [78]  is  a  system  that  can  detect  errors  in  concurrent  programs  that  arise  in  at  most 
two  context  switches.  The  two-context-switch  bound  enables  verification  using  a  tool  that 
can  only  handle  sequential  programs.  To  convert  a  concurrent  program  into  one  suitable 
for  a  sequential  analysis,  KISS  adds  nondeterministic  function  calls  to  the  main  method  of 
thread  2  after  each  statement  of  thread  1.  Likewise  it  adds  nondeterministic  function  returns 
after  each  statement  of  thread  2.  It  also  ensures  that  a  function  call  from  thread  1  to  thread 
2  is  only  performed  once.  This  technique  essentially  results  in  a  sequential  program  that 
mimics  the  behavior  of  a  concurrent  program  (with  two  threads)  for  two  context  switches. 

Using  our  techniques,  we  can  extend  KISS  to  determine  all  of  the  nodes  in  thread  1 
where  a  context  switch  can  occur  that  leads  to  an  error  later  in  thread  1.  One  way  to  do 
this  is  to  use  nondeterministic  calls  and  returns  as  KISS  does  and  then  compute  the  error 
projection.  However,  due  to  the  automata-theoretic  techniques  we  employ,  we  can  omit  the 
extra  additions.  The  following  algorithm  shows  how  to  do  this: 

1.  Create  A<±  =  Apost*  <  Apre*  for  thread  1. 

2.  Let  A2  be  the  result  of  a  poststar  query  from  main  for  process  2.  Let  w  = 
pathsummary(A2 );  w  represents  the  state  transformation  caused  by  the  execution 
steps  spent  in  thread  2. 

3.  For  each  program  node  7  of  thread  1,  let  wy,  =  BW(w,7)  be  the  weight  obtained  from 
functional  automaton  A <  of  thread  1.  By  using  w  as  the  bottleneck  weight,  we  account 
for  the  two  context  switches  (from  thread  1  to  2  and  from  2  back  to  1);  w  summarizes 
the  effect  produced  while  thread  2  has  control.  If  tc7  ^  0  then  an  error  can  occur  in 
the  program  when  the  first  context  switch  occurs  at  node  7  in  thread  1. 

Then  this  process  can  be  repeated  after  interchanging  the  roles  of  thread  1  and  thread  2. 
This  allows  thread  2  the  first  chance  to  execute.  Using  this  algorithm,  we  can  determine  all 
the  nodes  where  a  context  switch  must  occur  for  an  error  to  (eventually)  arise. 
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Error  Reporting 

The  model  checker  SLAM  [4]  used  a  technique  presented  in  [5]  to  identify  error  causes 
from  counterexample  traces.  Their  main  idea  is  to  remove  “correct”  transitions  from  the  error 
trace;  the  remaining  transitions  indicate  the  cause  of  the  error.  These  correct  transitions 
were  obtained  by  a  backward  analysis  from  non-error  configurations.  However,  no  restrictions 
were  imposed  that  these  transitions  also  be  reachable  from  the  entry  point  of  the  program. 
Thus,  their  technique  may  remove  too  many  transitions  and  fail  to  localize  the  error.  Using 
annotated  error  projections,  we  can  limit  the  correct  transitions  to  ones  that  are  both  forward 
reachable  from  program  entry  and  backward  reachable  from  the  non-error  configurations. 

5.6  Related  Work 

The  combination  of  forward  and  backward  analysis  has  a  long  history  in  abstract  inter¬ 
pretation,  going  back  to  Cousot’s  thesis  [23].  It  has  been  also  been  used  in  model  checking 
[62]  and  in  interprocedural  analysis  [41],  In  this  chapter,  we  have  shown  how  forward  and 
backward  approaches  can  be  combined  precisely  in  the  context  of  interprocedural  analysis 
performed  with  WPDSs;  our  experiments  show  that  this  approach  is  significantly  faster  than 
a  more  straightforward  one. 

With  model  checkers  becoming  more  popular,  there  has  been  considerable  work  on  ex¬ 
plaining  the  results  obtained  from  a  model  checker  in  an  attempt  to  localize  the  fault  in  the 
program  [20,  5].  These  approaches  are  complimentary  to  ours.  They  build  on  information 
obtained  from  reachability  analysis  performed  by  the  model  checker  and  use  certain  heuris¬ 
tics  to  isolate  the  root  cause  of  the  bug.  Error  projections  seek  to  maximize  information 
that  can  be  obtained  from  the  reachability  search  so  that  other  tools  can  take  advantage  of 
this  gain  in  precision.  This  chapter  focused  on  using  error  projections  inside  an  abstraction- 
refinement  loop.  The  second  application  in  Section  5.5  briefly  shows  how  they  can  be  used 
for  fault  localization.  It  would  be  interesting  to  explore  further  use  of  error  projections  for 
fault  localization. 
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Such  error-reporting  techniques  have  also  been  used  outside  model  checking.  Kremenek  et 
al.  [54]  use  statistical  analysis  to  rank  counterexamples  found  by  the  xgcc[28]  compiler.  Their 
goal  is  to  present  to  the  user  an  ordered  list  of  counterexamples  sorted  by  their  confidence 
rank. 

The  goal  of  both  program  slicing  [93]  and  our  work  on  error  projection  is  to  compute  a  set 
of  nodes  that  exhibit  some  property.  In  our  work,  the  property  of  interest  is  membership  in 
an  error  path,  whereas  in  the  case  of  program  slicing,  the  property  of  interest  is  membership 
in  a  path  along  data  and  control  dependence  edges.  Slicing  and  chopping  have  certain 
advantages — for  instance,  chopping  filters  out  statements  that  do  not  transmit  effects  from 
source  s  to  target  t.  These  techniques  have  been  generalized  by  Hong  et  al.  [39],  who  show 
how  to  perform  more  precise  versions  of  slicing  and  chopping  using  predicate-abstraction  and 
model  checking.  However,  their  methods  are  intraprocedural,  whereas  our  work  addresses 
interprocedural  analysis. 

Mohri  et  al.  investigated  the  intersection  of  weighted  automata  in  their  work  on  natural- 
language  recognition  [64,  65].  For  their  weight  domains,  the  extend  operation  must  be 
commutative.  We  do  not  require  this  restriction. 
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Chapter  6 

Interprocedural  Analysis  of  Concurrent  Programs  Un¬ 
der  a  Context  Bound 

The  analysis  of  concurrent  programs  is  a  challenging  problem.  While,  in  general,  the 
analysis  of  both  concurrent  and  sequential  programs  is  undecidable,  what  makes  concurrency 
hard  is  the  fact  that  even  for  simple  program  models,  the  presence  of  concurrency  makes 
their  analysis  computationally  very  expensive.  When  the  model  of  each  thread  is  a  finite- 
state  automaton,  the  analysis  of  such  systems  is  PSPACE-complete;  when  the  model  is  a 
pushdown  system,  the  analysis  becomes  undecidable  [80].  This  is  unfortunate  because  it 
does  not  allow  the  advancements  made  on  such  models  in  the  sequential  setting,  i.e.,  when 
the  program  has  only  one  thread,  to  be  applied  in  the  presence  of  concurrency. 

Another  consequence  of  the  above  result  is  that  the  analysis  of  concurrent  programs 
that  may  contain  recursion  is  undecidable.  Even  in  the  absence  of  recursion,  designing 
an  interprocedural  analysis,  i.e.,  an  analysis  that  can  precisely  reason  about  the  call-return 
semantics  of  a  procedure  call,  becomes  hard.  As  a  consequence,  to  deal  with  concurrency 
soundly,  most  analyses  give  up  precise  handling  of  procedures  and  become  context-insensitive. 
Alternatively,  tools  can  use  inlining  to  unfold  multi-procedure  programs  into  single-procedure 
ones.  This  approach  cannot  handle  recursive  programs,  and  can  cause  an  exponential  blowup 
in  the  size  for  non-recursive  ones. 

Because  interprocedural  analyses  have  proven  to  be  very  useful  for  sequential  programs 
[4,  83,  81,  84],  it  is  desirable  to  have  the  same  kind  of  precision  even  for  concurrent  programs. 
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A  different  way  to  sidestep  the  undecidability  of  analyzing  concurrent  recursive  programs 
is  to  limit  the  amount  of  concurrency  by  bounding  the  number  of  context  switches,  where 
a  context  switch  is  defined  as  the  transfer  of  control  from  one  thread  to  another.  Such  an 
abstraction  has  proven  to  be  useful  for  program  analysis  because  many  bugs  can  be  found 
in  a  few  context  switches  [78,  77,  70,  59].  We  use  the  term  context-hounded  analysis  (CBA) 
to  refer  to  the  general  approach  of  analyzing  recursive  and  concurrent  programs  under  a 
context  bound. 

CBA  does  not  impose  any  bound  on  the  execution  length  between  context  switches. 
Thus,  even  under  a  context  bound,  the  analysis  still  has  to  consider  the  possibility  that  the 
next  switch  takes  place  in  any  one  of  the  (possibly  infinite)  states  that  may  be  reached  after 
a  context  switch.  Because  of  this,  CBA  still  considers  many  concurrent  behaviors  [70]. 

In  previous  work,  Qadeer  and  Rehof  [77]  showed  that  CBA  is  decidable  when  program 
threads  are  modeled  using  pushdown  systems  (i.e.,  for  recursive  programs  under  a  finite- 
state  abstraction  of  program  data).  In  this  chapter,  we  generalize  their  result  to  weighted 
pushdown  systems  (i.e.,  to  recursive  programs  under  infinite-state  data  abstractions),  and 
also  provide  a  new  symbolic  algorithm  for  the  finite-state  case. 

Our  goal  is  to  be  able  to  take  any  abstraction  used  for  interprocedural  analysis  of  sequen¬ 
tial  programs  and  directly  extend  it  to  handle  context-bounded  concurrency.  Our  main  result 
follows  in  the  spirit  of  coincidence  theorems  in  dataflow  analysis  (for  sequential  programs) 
[44,  88,  52],  We  give  conditions  on  the  abstractions  under  which  CBA  can  be  precisely  solved, 
along  with  an  algorithm.  In  addition  to  the  usual  conditions  required  for  precise  interproce¬ 
dural  analysis  of  sequential  programs,  we  require  the  existence  of  a  tensor  product  (defined 
in  Section  6.5).  We  show  that  these  conditions  are  satisfied  by  a  class  of  abstractions,  thus 
giving  precise  algorithms  for  CBA  with  those  abstractions.  These  include  finite-state  ab¬ 
stractions,  such  as  the  ones  used  for  verification  of  Boolean  programs  in  model  checking  [4] , 
as  well  as  infinite-state  abstractions,  such  as  affine-relation  analysis  (ARA)  [67].  Note  that 
without  a  context  bound,  reasoning  about  concurrent  programs  under  these  abstractions  is 
undecidable  [80,  66]. 
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We  show  that  when  a  WPDS  is  used  to  model  each  thread  of  a  concurrent  program,  CBA 
can  be  precisely  carried  out  for  the  program,  provided  tensor  products  exist  for  the  weights. 

Motivation 

Context-bounded  analysis  is  not  sound  because  it  does  not  capture  all  of  the  behaviors 
of  a  program;  however,  it  has  been  shown  to  be  useful  for  program  analysis.  KISS  [78],  a 
verification  tool  for  CBA  with  a  fixed  context  bound  of  2,  found  numerous  bugs  in  device 
drivers.  A  study  with  an  explicit-state  model  checker  [70],  which  works  on  programs  with  a 
closed  environment,  found  more  bugs  with  slightly  higher  context  bounds.  It  also  showed  that 
the  amount  of  additional  state  space  covered  decreases  with  each  increment  of  the  context 
bound.  Thus,  even  a  small  context  bound  is  sufficient  to  cover  many  program  behaviors, 
and  proving  safety  under  a  context  bound  should  provide  confidence  towards  the  reliability 
of  the  program. 

Unlike  the  above-mentioned  work,  this  dissertation  addresses  CBA  with  any  given  context 
bound  and  with  different  program  abstractions  (including  ones  that  would  cause  explicit- 
state  model  checkers  not  to  terminate). 

In  Chapter  7,  we  add  to  the  above  results  on  the  utility  of  CBA.  Using  a  symbolic  model 
checker  that  works  on  programs  with  an  open  environment,  we  showed  that  many  bugs  can 
be  found  in  a  few  context  switches.  Motivated  by  these  reasons,  our  goal  is  to  develop 
analyses  that  are  sound  under  a  context  bound. 

Previous  work  has  only  considered  CBA  for  a  restricted  set  of  abstractions.  Having  the 
ability  to  do  CBA  with  other  abstractions  can  be  useful  for  analyzing  concurrent  programs. 
For  example,  it  can  be  useful  to  combine  CBA  with  ARA.  This  is  illustrated  by  the  program 
snippet  in  Fig.  6.1.  Here,  multiple  threads  share  the  circular  buffer  q  in  a  producer  (enq) 
consumer  (deq)  fashion.  Using  CBA  with  ARA  with  modular  arithmetic  [68],  one  can 
prove  (under  a  given  context  bound)  that  (hd  -  tl  -  ent)  %  SIZE  =  0  provided  SIZE  is 
a  prime  power.  ARA  generalizes  analyses  like  copy-constant  propagation,  linear-constant 
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propagation,  and  induction-variable  analysis.  It  can  be  used  to  find  invariants,  such  as  the 
one  shown  above,  to  do  verification  or  to  increase  the  precision  of  other  analyses. 


Elem  q[SIZE]; 
int  hd  =  cnt  =  tl  =  0; 


void  enq(Elem  e)  { 
while(true)  { 
atomic  { 
if(  cnt  <  SIZE)  { 
q[tl]  =  e; 
tl  =  (tl+1  )%SIZE; 
cnt  ++; 
break; 

} 


Elem  deq()  { 
while(true)  { 
atomic  { 
if(cnt  >  0)  { 

Elem  e  =  q[hd]; 
hd  =  (hd+1  )%SIZE; 
cnt--; 
return  e; 

} 

}}} 


Figure  6.1  A  concurrent  program  that  manages  a  circular  queue. 


The  context  bound  used  for  CBA  can  be  increased  iteratively  to  consider  more  effects 
of  concurrency  and  to  analyze  more  program  behaviors.  This  has  the  added  advantage 
of  finding  bugs  in  the  fewest  context  switches  needed  to  trigger  them.  It  is  reasonable  to 
consider  a  bug  that  arises  only  after  a  greater  number  of  context  switches  to  be  “harder” 
than  a  bug  that  requires  fewer  context  switches.  Thus,  CBA  allows  additional  concurrency 
to  be  considered  “on-demand”  by  increasing  the  context  bound. 

Challenges  and  Techniques 

Between  consecutive  context  switches,  a  concurrent  program  acts  like  a  sequential  pro¬ 
gram  because  only  one  thread  is  executing.  However,  a  recursive  thread  can  reach  an  infinite 
number  of  states  before  the  next  context  switch  because  it  has  an  unbounded  stack.  CBA 
has  to  consider  the  possibility  of  a  context  switch  occurring  at  any  one  of  these  states. 

The  Qadeer-Rchof  (QR)  algorithm  uses  PDSs  to  encode  program  threads.  The  set  of 
reachable  states  of  a  PDS  can  be  represented  using  an  automaton  [15]  (see  Section  2.3.2). 
The  QR  algorithm  makes  use  of  this  result  to  get  a  handle  on  all  reachable  states  between 
context  switches.  However,  to  explore  all  possible  context  switches,  it  crucially  relies  on  the 
finiteness  of  the  data  abstraction  because  it  enumerates  all  reachable  data  states  at  a  context 


switch. 
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Our  first  step  is  to  develop  a  new  algorithm  for  the  case  of  PDSs.  Our  motivation  is  to 
have  an  algorithm  that  is  more  likely  to  generalize  to  handle  other  abstractions.  The  new 
algorithm  (Section  6.3)  represents  the  effect  of  executing  a  thread  (a  PDS)  from  any  arbitrary 
state  using  a  finite-state  transducer.  The  transducer  accepts  a  pair  (ci,  c2)  if  a  thread,  when 
started  in  state  c\,  can  reach  state  c2.  Caucal  [17]  showed  that  such  transducers  can  be 
constructed  for  PDSs.  This  is  a  more  general  result  than  the  one  on  reachability  in  PDSs 
that  was  used  in  the  QR  algorithm.  Next,  to  describe  the  behavior  of  the  entire  program  with 
multiple  threads,  these  transducers  are  composed.  One  transducer  composition  is  performed 
for  each  context  switch. 

We  then  generalize  this  algorithm  for  WPDSs  (Section  6.4  and  Section  6.5).  The  weights 
(or  the  data  abstraction)  add  several  complications.  We  define  weighted  transducers  to 
capture  the  reachability  relation  of  WPDSs.  We  show  that  a  weighted  transducer  can  always 
be  constructed  for  a  WPDS  (no  such  result  was  known  previously).  The  next  step  is  to 
compose  these  transducers.  While  weighted  automata  and  transducers  have  been  considered 
in  the  literature  before,  the  weights  are  assumed  to  have  much  stronger  properties  (especially 
commutativity,  which  defeats  the  purpose  of  CBA  by  making  thread  interleavings  redundant, 
as  we  shall  see  later).  For  program  analysis,  we  only  have  weaker  properties  on  weights.  To 
compose  weighted  transducers,  we  require  that  weight  domains  provide  a  tensor-product 
operation  (Section  6.5).  Tensor  products  have  been  used  previously  in  program  analysis  for 
combining  abstractions  [71].  However,  we  use  them  in  a  different  context  and  for  a  completely 
different  purpose.  In  particular,  previous  work  has  used  them  for  combining  abstractions 
that  are  to  be  performed  in  lock- step]  in  contrast,  we  use  them  to  stitch  together  the  data 
state  before  a  context  switch  with  the  data  state  after  a  context  switch.  This  is  non-trivial 
because  the  data  state  is  correlated  with  an  (unbounded)  program  stack. 

By  using  WPDSs,  not  only  do  we  obtain  new  algorithms  for  infinite-state  abstractions, 
but  also  symbolic  algorithms  for  finite-state  abstractions.  The  latter  algorithms  avoid  the 
enumeration  that  the  QR  algorithm  performs  at  a  context  switch. 

The  contributions  of  the  work  presented  in  this  chapter  can  be  summarized  as  follows: 
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•  We  give  sufficient  conditions  under  which  CBA  is  decidable,  along  with  an  algorithm. 
This  generalizes  previous  work  on  CBA  of  PDSs  [77] .  Our  result  also  proves  that  CBA 
can  be  decided  for  affine-relation  analysis,  i.e. ,  we  can  precisely  find  all  affine  relation¬ 
ships  between  program  variables  that  hold  at  a  particular  point  in  the  (concurrent) 
program.  We  use  WPDSs  as  our  program  model,  and  the  weights  encode  the  program’s 
data  abstraction.  By  using  WPDSs,  we  can  also  answer  “stack-qualified”  queries  [83] , 
which  ask  for  the  set  of  values  that  may  arise  at  a  program  point  in  a  given  calling 
context,  or  in  a  regular  set  of  calling  contexts. 

•  We  show  that  for  WPDSs,  the  reachability  relation  can  be  encoded  using  a  weighted 
transducer  (Section  6.4),  generalizing  a  previous  result  for  PDSs  by  Caucal  [17]. 

•  We  give  precise  algorithms  for  composing  weighted  transducers  (Section  6.5),  when 
tensor  products  exist  for  the  weights.  This  generalizes  previous  work  on  manipulating 
weighted  automata  and  transducers  [64,  65].  We  also  show  a  class  of  abstractions  that 
satisfies  this  property. 

•  We  discuss  implementation  issues  for  realizing  CBA  in  Section  6.6.  We  show  that  for 
PDSs,  CBA  is  NP-complete.  Our  algorithm,  based  on  transducers,  does  have  a  large 
complexity,  but  it  is  more  amenable  to  symbolic  techniques  such  as  using  BDDs  (in 
the  finite-state  case)  than  the  QR  algorithm. 

The  rest  of  the  chapter  is  organized  as  follows.  In  Section  6.1,  we  formally  define  CBA. 
In  Section  6.2,  we  discuss  previous  work  on  CBA  under  a  finite-state  data  abstraction.  In 
Section  6.3,  we  present  our  algorithm  for  PDSs,  which  is  based  on  transducers.  The  later 
sections  generalize  this  result  to  WPDSs.  In  Section  6.4,  we  give  an  efficient  construction  for 
transducers  for  WPDSs.  In  Section  6.5,  we  show  how  weighted  transducers  can  be  composed. 
In  Section  6.6,  we  discuss  implementation  issues  for  CBA.  In  Section  6.7,  we  discuss  related 
work.  A  further  generalization  of  CBA  is  discussed  in  Chapter  7. 
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6.1  Problem  Definition 

Notation.  A  binary  relation  on  a  set  S'  is  a  subset  of  S  x  S.  If  R\  and  P2  are  binary 
relations  on  S,  then  their  relational  composition  (PpP2)  is  defined  as  {(si,s3)  |  3s2  G 
S,  (si,  s2)  G  R.\ ,  (s2,  s 3)  G  P2}.  If  R  is  a  binary  relation,  Rl  is  the  relational  composition  of  R 
with  itself  i  times,  and  R°  is  the  identity  relation  on  S.  R*  =  U ^Z0Rl  is  the  reflexive-transitive 
closure  of  R. 

The  Unweighted  Case 

First,  we  define  CBA  under  a  finite-state  data  abstraction,  i.e.,  when  PDSs  are  used 
as  the  abstract  model  of  threads.  To  distinguish  this  case  from  the  general  one,  and  in 
keeping  with  the  nomenclature  used  in  previous  work,  we  call  CBA  under  a  finite-state  data 
abstraction  context-bounded  model  checking  (CBMC). 

The  abstract  model  for  CBMC  is  a  concurrent  PDS,  defined  as  sequence  of  PDSs, 
(Pi,  P2,  •  ■  ■  ,Pn),  Vi  =  (P,  T*,  A*),  which  have  the  same  set  of  control  locations  (P).  A 
configuration  of  this  model  is  a  tuple  (p,  u\,  •  •  •  ,un),  where  p  G  P,  Ui  G  T*.  The  transition 
system  of  this  model,  which  is  a  binary  relation  on  the  set  of  all  such  configurations,  is 
defined  as  follows.  Let  the  transition  relation  of  P,  be  =>j.  If  (p,Ui)  (p1,  ii-),  then  we  say 
(p,  11 1 , —  ,  Ui,  •  •  •  ,  un)  (p',  U\  •  •  •  ,  un).  The  union  of  for  all  i  —  1  to  n  defines 

the  transition  relation  for  the  CBMC  model,  i.e.,  it  defines  a  single  execution  step  of  the 
model. 

Concurrent  PDSs  can  encode  finite-state  data  abstractions  of  recursive  concurrent  pro¬ 
grams.  Consider  a  concurrent  Boolean  program,  defined  as  a  set  of  Boolean  programs,  one  for 
each  thread,  in  which  the  global  variables  are  shared  between  the  threads.  (Thus,  any  of  the 
threads  can  modify  the  global  variables  and  their  own  copy  of  the  local  variables,  but  they 
cannot  directly  read  from  or  write  to  local  variables  of  other  threads.)  Synchronization  be¬ 
tween  threads  can  be  easily  implemented,  e.g.,  by  using  global  variables  to  implement  locks. 
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Such  models  are  a  natural  description  of  recursive  concurrent  programs  with  a  finite-state 
data  abstraction.  We  do  not  consider  dynamic  creation  of  threads  in  our  model.1 

Concurrent  Boolean  programs  can  be  encoded  using  a  concurrent  PDS.  Let  B  be  a 
concurrent  Boolean  program  with  n  threads  ti,t2,  ■  ■  ■  ,tn.  Let  G  be  the  set  of  global  states 
of  B  (valuations  of  global  variables)  and  L*  be  the  set  of  local  states  of  R,  which  includes 
valuation  of  local  variables  as  well  as  the  program  stack  (as  described  in  Section  2.2).  Then 
the  state  space  of  B  consists  of  the  global  state  paired  with  local  states  of  each  of  the  threads, 
i.e.,  the  set  of  states  is  G  x  L\  x  •  •  •  x  Ln. 

Let  Vti  be  the  PDS  that  encodes  the  Boolean  program  tt  (Section  2.3).  Because  different 
threads  share  the  same  global  variables,  the  PDSs  V t.  have  the  same  set  of  control  locations 
(which  is  G).  Then  the  concurrent  PDS  Vb  =  (W, ,  Vt2,  •  •  •  ,Vtn)  encodes  B.  It  is  easy  to 
see  that  the  transition  relation  of  Vb  describes  a  single  execution  step  of  B. 

The  CBMC  problem  is  to  find  the  set  of  reachable  states  of  a  concurrent  PDS 
(Vi ,  7^2?  •  •  •  ,  Vn)  under  a  bound  on  the  number  of  context  switches.  Formally,  let  k  be 
the  bound  on  context  switches.  The  execution  of  a  concurrent  program  can  be  decomposed 
to  a  sequence  of  execution  contexts.  In  an  execution  context,  one  thread  has  control  and  it 
executes  a  finite  number  of  steps.  The  execution  context  changes  at  a  context  switch  and 
control  is  passed  to  a  different  thread.  For  k  context  switches,  there  must  be  k  + 1  execution 
contexts.  Let  =^ec  be  (U"=1(=^f)*),  the  transition  relation  that  describes  the  effect  of  one 
execution  context.  Then  the  CBMC  problem  is  to  find  the  set  of  reachable  states  in  the 
transition  relation  given  by  (=^ec)fc+1.  Note  that  while  a  bound  is  placed  on  the  number  of 
context  switches,  no  bound  is  placed  on  the  length  of  an  individual  execution  context. 

Analysis  of  concurrent  Boolean  programs  is  undecidable  [80],  i.e.,  it  is  not  possible  to 
verify  if  a  given  state  is  reachable  under  the  transition  system  (=yec)*  or  not,  but  Qadeer 
and  Rehof  showed  that  CBMC,  i.e.,  reachability  under  (=yec)fc+1  for  a  fixed  /c,  is  decidable. 

1Dynamic  creation  up  to  n  threads  can  be  encoded  in  the  model  [77].  Moreover,  for  CBA  that  considers 
k  context  switches,  n  can  be  bounded  by  k  because  other  threads  would  never  get  a  chance  to  run. 
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The  Weighted  Case 

Now  we  define  CBA  in  the  general  case,  i.e.,  when  WPDSs  are  used  as  the  abstract  model 
for  each  thread. 

The  transition  relation  of  a  WPDS  is  a  weighted  relation  (Defn.  2.4.15)  over  the  set  of 
PDS  configurations.  For  configurations  c\  and  C2,  if  ri ,  -  -  -  ,rm  are  all  the  rules  such  that 
Ci  =>ri  c2,  then  (ci,  c2,  ©i/fo))  is  in  the  weighted  relation  of  the  WPDS.  In  a  slight  abuse 
of  notation,  we  will  use  =>■  and  its  variants  for  the  weighted  transition  relation  of  a  WPDS. 
Note  that  the  weighted  relation  maps  the  configuration  pair  (ci,c2)  to  IJOP({ci},  {c2}). 

The  CBA  problem  is  the  same  as  the  one  for  CBMC,  except  that  all  relations  are  weighted. 
This  means  that  each  thread  is  modeled  as  a  WPDS  and  their  underlying  PDSs  have  the 
same  set  of  control  states. 

Given  the  weighted  relation  R  =  (=4>ec)A:+1,  the  set  of  initial  configurations  S  and  a  set  of 
final  configurations  T,  we  want  to  be  able  to  solve  for  R(S,T )  =  ®{i?(s,t)  |  s  G  S,  t  G  T}. 
This  captures  the  net  transformation  on  the  data  state  between  S  and  T\  it  is  the  combine 
over  the  values  of  all  paths  involving  at  most  k  context  switches  that  go  from  a  configuration 
in  S'  to  a  configuration  in  T.  Our  results  from  Section  6.4  and  Section  6.5  allow  us  to  solve 
for  this  value  when  S  and  T  are  regular  sets  of  configurations. 

This  problem  definition  allows  one  to  precisely  encode  concurrent  Boolean  programs  (with 
variations  such  as  finding  the  shortest  trace),  as  well  as  concurrent  affine  programs,  when 
each  only  have  global  variables.  (The  extension  to  local  variables  requires  that  the  threads 
be  modeled  using  EWPDSs,  which  is  left  as  future  work.) 

For  example,  consider  two  copies  of  the  program  in  Fig.  2.2 (a)  running  in  parallel.  Let 
the  CFG  nodes  of  the  second  copy  be  T'  =  {n\ ,  •  •  •  ,  n'8 } ,  to  distinguish  them  from  those  of 
the  first  copy.  With  k  =  2,  S  =  {(p,n i,^)}  (the  starting  configuration  of  the  program), 
T  =  {(p,  n^u')  |  u'  G  (F7)*}  (thread  1  is  at  n%  and  thread  2  can  have  any  stack),  and  R  as 
above,  the  weight  R(S,  T )  is  a  relation  with  range  {(3,  3),  (3,  7),  (7,  3),  (7,  7)},  meaning  that 
these  valuations  of  (x,  y)  are  possible  at  some  configuration  in  T. 
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6.2  Context  Bounded  Model  Checking 

In  this  section,  we  describe  the  Qadeer-Rehof  (QR)  algorithm  for  CBMC.  It  works  under 
the  assumption  that  the  set  G  is  finite.  Under  such  an  abstraction,  the  only  source  of 
unboundedness  is  the  program  stack. 

The  algorithm  proceeds  by  iteratively  increasing  the  number  of  execution  contexts. 
Within  one  execution  context,  the  global  state  can  be  considered  local  to  the  executing 
thread  because  it  is  the  only  thread  that  accesses  it.  At  a  context  switch,  the  global  state 
is  synchronized  with  other  threads  so  that  they  have  the  same  view  of  the  shared  memory. 
The  algorithm  needs  G  to  be  finite  to  be  able  to  explore  all  possibilities  at  a  context  switch. 
We  only  give  an  overview  of  the  QR  algorithm,  presenting  it  mostly  in  terms  of  explicit 
state  spaces  and  just  touch  on  a  few  aspects  of  a  PDS-based  implementation.  A  complete 
description  of  the  PDS-based  implementation  is  given  in  [77]. 

If  Si  C  Li  is  a  set  of  local  states,  then  let  (g,  Si,  S2,  ■  ■  ■  ,  Sn)  be  the  set  of  states 
,ln)  \  h  £  Si}.  We  use  the  symbol  rj  as  a  shorthand  for  such  a  set  of  states. 
The  QR  algorithm  is  a  worklist-based  algorithm.  An  item  on  the  worklist  is  a  pair  (rj,  i),  de¬ 
noting  that  the  set  of  states  r/  is  reachable  in  up  to  %  context  switches.  Initially,  the  worklist 
contains  (r/init ,  0),  where  r/init  is  the  starting  set  of  states  for  the  program.  Then  the  algorithm 
repeats  the  following  steps  until  the  worklist  is  empty. 

1.  Select  and  remove  an  item  (77,  z)  from  the  worklist.  If  i  =  k,  then  the  context  bound 
has  been  reached,  so  pick  another  item. 

2.  Let  g  —  (g,  Si,  ■■  ■  ,  Sn).  For  each  j  from  1  to  n,  repeat  steps  3  and  4. 

3.  Using  a  thread- local  analysis  on  tj,  find  the  set  of  states  that  tj  can  reach  when  started 
from  the  set  of  states  ( g,Sj ).  Let  this  set  be  Rj,  i.e.,  ( g,Sj )  Rj.  In  PDS  terms, 
Rj  =  post*t  ((g ,  Sj)).  Write  Rj  as  U ™=i(gp,  Rj).  This  implies  that  thread  tj  can  change 
the  global  state  from  g  to  gp  and  itself  reach  some  local  state  in  R1 2 3}. 
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(9oo’  So,  T0) 

R  =  post/fgoo,  S0) 


[(S2,  R’)  |  R'  =  t  (g2„T,),  (g22.T2) . (g2p,Tp) } 

(921  >  S2,  Ti)  (922,  S2,  T2)  ■"  (02p’  S2,  Tp) 


Figure  6.2  The  computation  of  the  QR  algorithm,  for  two  threads,  shown  schematically  in 
the  form  of  a  tree.  The  shaded  boxes  are  just  temporary  placeholders  and  are  not  inserted 
into  the  worklist.  The  thick  arrows  correspond  to  Step  3  and  other  arrows  correspond  to 
Step  4.  The  set  of  tuples  at  level  i  of  the  tree  correspond  to  all  states  reached  in  i  context 

switches. 


4.  For  each  gp  produced  in  the  above  step,  the  set  of  states  gp  = 
(gp,  Si,  ■  ■  ■  ,  Sj-i,  R*j,  Sj+i,  •  •  •  ,Sn)  are  reachable  in  up  to  i  +  1  context  switches. 
Insert  (■ gp ,  %  +  1)  into  the  worklist. 

Steps  3  and  4  take  a  starting  set  of  states  rj  and  produce  all  states  that  are  reachable  in 
one  execution  context.  First,  a  thread  t3  is  picked  that  gets  to  execute  in  that  context.  Then 
step  3  finds  all  states  that  execution  of  tj  can  produce.  For  each  of  the  global  states  gp  that 
can  be  produced,  it  is  passed  to  all  other  threads  at  the  context  switch  in  step  4.  The  set  of 
tuples  (77,  i)  with  i  =  k  represent  the  set  of  all  reachable  states.  The  computation  performed 
by  this  algorithm  is  depicted  in  Fig.  6.2  in  the  form  of  a  tree. 

An  important  aspect  of  the  algorithm  is  the  way  it  manipulates  set  of  states.  An  item 
on  the  worklist  is  of  the  form  (g,S  1,  •  •  •  ,  Sn),  representing  a  set  of  states.  The  global  state  g 
is  kept  explicit  because  it  is  required  for  synchronization  across  threads  at  a  context  switch. 
The  local  states  need  not  be  kept  explicit,  and  they  are  collected  in  the  sets  St.  This  is 
important  because  the  set  of  local  states  can  be  infinite.  In  the  PDS-based  implementation, 
the  sets  Si  are  kept  in  symbolic  form  using  automata  (Defn.  2.3.2).  The  poststar  algorithm 
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works  on  these  representations,  mapping  automata  (capturing  starting  configurations)  to 
automata  (capturing  reachable  configurations). 

6.3  A  New  Algorithm  for  CBMC  Using  Transducers 

The  QR  algorithm  fails  to  generalize  to  infinite-state  abstractions  because  of  its  require¬ 
ment  to  keep  the  global  state  explicit  in  the  worklist  items.  After  each  context  switch,  the 
algorithm  does  a  “fan-out”  proportional  to  the  size  of  the  global  state  space  |G|  (see  Fig.  6.2) 
to  pass  the  global  state  to  all  other  threads.  This  is  also  true  for  the  PDS-based  implemen¬ 
tation  of  the  QR  algorithm.  The  algorithm  presented  in  this  section  avoids  such  a  fan-out 
(and  will  be  extended  to  infinite-state  abstractions  in  Section  6.4  and  Section  6.5). 

The  QR  algorithm  makes  several  calls  to  poststar  to  compute  the  forward  reachable  states 
in  a  single  thread.  This  is  crucial  to  be  able  to  work  with  infinite  sets  of  configurations. 
However,  the  disadvantage  is  that  poststar  requires  a  starting  set  of  configurations  to  find 
all  of  the  reachable  configurations.  Creation  of  this  starting  set  is  what  forces  the  fan-out 
operation  to  alternate  with  calls  to  poststar. 

A  similar  problem  arises  in  interprocedural  analysis  of  sequential  programs:  a  procedure 
can  get  called  from  multiple  places  with  multiple  different  input  values.  Instead  of  reanalyz¬ 
ing  the  procedure  for  each  input  value,  it  is  analyzed  independently  of  the  calling  context  to 
create  a  summary.  This  summary  concisely  describes  the  effect  of  executing  the  procedure 
in  any  calling  context,  in  terms  of  the  relation  between  input  to  the  procedure  and  its  out¬ 
put.  Similarly,  instead  of  reanalyzing  a  thread  every  time  it  receives  control  after  a  context 
switch,  we  create  a  summary  for  it.  The  difficulty  is  that  the  “input”  here  is  a  starting  set  of 
configurations,  and  the  “output”  is  the  reachable  sets  of  configurations;  again,  the  summary 
must  be  relation- valued.  Because  both  of  these  sets  can  be  infinite,  we  need  the  summary 
to  be  representable  symbolically. 

Our  approach  to  generalizing  the  QR  algorithm  (for  both  finite-state  and  infinite-state 
data  abstractions)  is  based  on  the  following  observation: 
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Observation  1.  One  can  construct  an  appropriate  summary  of  a  thread’s  behavior  using  a 
finite-state  transducer  (an  automaton  with  input  and  output  tapes). 

Definition  6.3.1.  A  finite-state  transducer  t  is  a  tuple  (Q,  £*,  A,  I,  F),  where  Q  is  a 
finite  set  of  states,  £,;  andH0  are  input  and  output  alphabets,  A  C  Qx  (EjUje})  x  (£0U{e})  xQ 
is  the  transition  relation,  I  C  Q  is  the  set  of  initial  states,  and  F  C  Q  is  the  set  of  filial 
states.  If  (qi,a,b,q2)  G  A,  written  as  q\  — - ^  >  q2,  we  say  that  the  transducer  can  go  from 
state  q\  to  q2  on  input  a,  and  outputs  the  symbol  b.  Given  a  state  q  G  I,  we  say  that  the 
transducer  can  accept  a  string  cq  G  £*  with  output  oQ  G  £*  if  there  is  a  path  from  state  q 
to  a  final  state  that  takes  input  a,  and  outputs  aQ.  The  language  of  the  transducer  C{r)  is 
defined  as  the  following  subset  of  £*  x  £*  :  {(a,,  a0)  \  the  transducer  can  output  string  aQ 
when  the  input  is  a, } . 

Given  a  PDS  V,  one  can  construct  a  transducer  r-p  whose  language  equals  =^*,  the 
transitive  closure  of  V’s  transition  relation:  The  transducer  accepts  a  pair  (ci,  c2)  if  a  thread, 
when  started  in  state  ci,  can  reach  state  c2.  This  result  was  first  given  by  Caucal  [17], 
but  it  was  not  accompanied  with  a  complexity  result,  except  that  it  was  polynomial  time. 
Onr  construction  of  transducers  for  WPDSs  (strictly  more  general  than  Caucal’s  result) 
makes  use  of  recent  advancements  in  the  analysis  of  (W)PDSs  [9,  30,  85,  83]  for  an  efficient 
construction.  Because  such  transducers  are  of  general  importance,  we  give  a  complexity 
result.  The  following  theorem  is  derived  from  Thm.  6.4.4  given  in  Section  6.4. 

Theorem  6.3.2.  Given  a  PDS  V  =  (P,  T,  A),  a  transducer  rp  can  be  constructed  such  that 
it  accepts  input  {p\  ufi)  and  outputs  (p2  u2 )  if  and  only  if  (pi,ui)  ( p2,u2 ).  Moreover,  this 

transducer  can  be  constructed  in  time  0(|P||A|(|P||r|  +  |A|))  and  has  at  most  |P|J|r|  +  |P||A| 
states. 

The  advantage  of  using  transducers  is  that  they  are  closed  under  relational  composition. 

Lemma  6.3.3.  Given  transducers  T\  and  t2  with  input  and  output  alphabet  £,  one  can 
construct  a  transducer  (rp  r2)  such  that  Cfirp,  r2)  =  £(ti);  C{t2),  where  the  latter  denotes 
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composition  of  relations.  Similarly,  if  A  is  an  automaton  with  alphabet  £,  one  can  construct 
an  automaton  r\{A)  such  that  its  language  is  the  image  of  C{A)  under  C{jf),  i.e.,  the  set 
{■u  G  E*  |  3 u'  G  C{A),{u' ,u)  G  £(ti)}. 

Both  of  these  constructions  are  carried  out  in  a  manner  similar  to  automaton  intersection 
[40].  For  composing  transducers,  for  each  transition  p  -  -*  — >  q  in  T\  and  transition  p'  - — ->  q' 
in  t-2,  add  the  transition  (p,p')  — c>  (q,qr)  to  their  composition.  For  transducer-automaton 
application,  each  transition  p  °^b  >  q  in  T\  is  matched  with  transition  p'  -A  q'  in  A  to  produce 
transition  (p,p')  (q,  q')  in  Ti(A).  One  can  also  take  the  union  of  transducers  (union  of 

their  languages)  in  a  manner  similar  to  union  of  automata. 

Coming  back  to  CBMC,  each  thread  is  represented  using  a  PDS.  Thus,  we  can  construct  a 
transducer  rf.  for  the  transition  relation  of  thread  f*,  i.e.,  for  =>*.  By  extending  rti  to  perform 
the  identity  transformation  on  stack  symbols  of  threads  other  than  f*  (using  transitions  of 
the  form  p  pf  we  obtain  a  transducer  rf  for  (=Ff)*.  Next,  a  union  of  these  transducers 
gives  rec,  which  represents  =^ec.  Performing  the  composition  of  rec  k  times  with  itself  gives 
us  a  transducer  r  that  represents  (=»ec)fc+1.  If  an  automaton  A  captures  the  set  of  starting 
states  of  the  concurrent  program,  t(A)  gives  a  single  automaton  for  the  set  of  all  reachable 
states  in  the  program  (under  the  context  bound). 

We  believe  that  the  above  algorithm  provides  a  better  basis  for  implementing  a  tool 
for  CBMC  than  the  QR  algorithm.  In  particular,  the  new  algorithm  avoids  the  fan-out 
step,  which — as  we  show  below — allows  it  to  be  extended  to  infinite-state  data  abstractions. 
To  make  this  extension,  we  represent  (recursive)  programs  with  infinite-state  abstractions 
using  WPDSs.  Extending  our  algorithm  to  WPDSs  presents  two  challenges:  one  is  the 
construction  of  a  weighted  transducer  for  a  WPDS,  and  the  other  is  the  composition  of  two 
weighted  transducers.  These  issues  are  addressed  in  Section  6.4  and  Section  6.5,  respectively. 

6.4  Weighted  Transducers 

In  this  section,  we  show  how  to  construct  a  weighted  transducer  for  the  weighted  relation 
of  a  WPDS.  We  defer  the  definition  of  a  weighted  transducer  to  a  little  later  in  this 
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(Pi,  7i  72  73  •  •  •  7n) 


(P2,  72  73  •  •  •  7fc+l  7fc+2  • 

■■7n) 

—^2 

(P3)  73  •  •  •  7fc+i  7fc+2  • 

‘  7n) 

— 

(Pfc+i)7fc+i  7fc+2  • 

■■7n) 

— ^Ofc+1 

(Pk+2,Ul  U2  ■  ■  ■  Uj  7fc+2  • 

"7n) 

Figure  6.3  A  path  in  the  PDS’s  transition  relation;  tq  G  T,  j  >  1,  k  <  n. 


section  (Defn.  6.4.3).  Onr  solution  uses  the  following  observation  about  paths  in  a  PDS’s 
transition  relation.  Every  path  a  G  A*  that  starts  from  a  configuration  (pi,  7172  •  •  •  7„) 
can  be  decomposed  as  a  =  0707  ■  ■  ■  0707+1  (see  Fig.  6.3)  such  that  (pi,7i)  (pi+ i,e)  for 
1  <  i  <  k,  and  (p^+i,  7fc+i)  =t>orfe+1  (pk+2,  U\U2  ■  ■  ■  Uj ):  every  path  has  zero  or  more  pop  phases 
(<j  1 ,  cr2,  •  •  •  ,  07)  followed  by  a  single  growth  phase  (07+1): 

1.  Pop-phase:  A  path  such  that  the  net  effect  of  the  pushes  and  pops  performed  along 
the  path  is  to  take  (p,  77/)  to  (p',u),  without  looking  at  u  G  T*.  Equivalently,  it  can 
take  (p,  7)  to  (p',£). 

2.  Growth-phase:  A  path  such  that  the  net  effect  of  the  pushes  and  pops  performed 
along  the  path  is  to  take  (p, 77/)  to  (p',u'u)  with  v!  G  T"1",  without  looking  at  u  G  T*. 
Equivalently,  it  can  take  (p,  7)  to  (p',u'). 

Intuitively,  this  holds  because  for  a  path  to  look  at  y2,  it  must  pop  off  71.  If  it  does  not 
pop  off  71,  then  the  path  is  in  a  growth  phase  starting  from  71.  Otherwise,  the  path  just 
completed  a  pop  phase.  We  construct  the  transducer  for  a  WPDS  by  computing  the  net 
transformation  (weight)  implied  by  these  phases.  First,  we  define  two  procedures: 

1.  pop  :  P  x  T  x  P  D  is  defined  as  follows: 

P°P(P,  7,p0  =  ©M©  I  (P,  7)  (p',£)} 

2.  prow  :PxT->  ((P  x  T+)  — »  D)  is  defined  as  follows: 

grow(p,~f)(p',u)  =  0{u(<7)  |  (p, 7)  (p',«)} 
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Note  that  grow(p,  7)  =  poststar((p,  7)),  where  the  latter  is  interpreted  as  a  function  from 
configurations  to  weights  that  maps  a  configuration  to  the  weight  with  which  it  is  accepted, 
or  0  if  it  is  not  accepted.  The  following  lemmas  give  efficient  algorithms  for  computing  the 
above  quantities. 

Lemma  6.4.1.  Let  A  =  (P,T,(/),  P,  P)  be  a  V-automaton  that  represents  the  set  of  config¬ 
urations  C  =  {{p,e)  |  p  G  P}.  Let  Ap0p  be  the  forward  weighted- automaton  obtained  by 
running  prestar  on  A.  Then  pop(p,'y,p>)  is  the  weight  on  the  transition  (p,  7,//)  in  Apop. 
We  can  generate  Apop  in  time  Os(\P\2\A\H) ,  and  it  has  at  most  \P\  states. 

Proof.  This  follows  directly  from  Lem.  2.3.4  and  its  weighted  version  described  in  Section 
2.4.1.  However,  we  give  a  slightly  informal,  but  intuitive,  proof  here.  We  use  the  fact  that 
the  saturation-based  implementation  of  prestar  (Section  2.4.1)  is  correct. 

The  lemma  runs  prestar  on  the  empty  automaton,  i.e.,  one  with  no  transitions,  which 
represents  the  configuration  set  C  =  {(p,e)  \  p  G  P}.  Let  /3  be  a  stack  symbol  not  in  T,  and 
Ag  be  an  automaton  with  two  states  {p,  q},  q  P  and  a  single  transition  ( p,/3,q )•  Let  q 
be  the  final  state  of  this  automaton.  Because  (3  ^  T,  running  prestar  on  Ap3  will  return  the 
same  automaton  as  the  one  returned  by  running  prestar  on  the  empty  automaton,  except  for 
the  extra  transition  ( p,/3,q )  (because  no  rule  can  match  (3 ).  Ap  represents  the  configuration 
set  {(p,/3)},  and  therefore,  APp((p',  7  (3 ))  =  pop(p'  ,^,p)  according  to  the  definition  of  pop. 
However,  Ap3((p',  7  (3))  is  exactly  the  weight  on  the  transition  ( p ',  y,p)  because  the  only  path 
in  Ap3  that  accepts  (7  (3 )  starting  in  state  p'  is  the  one  that  follows  transitions  (p',7,p)  and 
(p,  /3,  q).  The  result  follows  by  repeating  the  argument  for  all  p  G  P.  □ 

Lemma  6.4.2.  Let  Af  =  (Q,  T,  — >,  P,  F)  be  a  V-automaton,  where  Q  =  P  U  {qpn  \  p  G 
P,  7  G  T}  and  p  -W  qpr/  for  each  p  G  P,  7  G  T.  Then  A{qpri}  represents  the  configuration 
(p,  7) .  Let  A  be  this  automaton  where  we  leave  the  set  of  final  states  undefined.  Let  Agr0w 
be  the  backward  weighted-automaton  obtained  from  running  poststar  on  A  (poststar  does  not 
need  to  know  the  final  states).  If  we  restrict  the  final  states  in  Agrow  to  be  just  gPi7  (and 
remove  all  states  that  do  not  have  an  accepting  path  to  the  final  state),  we  obtain  a  backward 
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weighted- automaton  Apa  =  poststar((p,  7))  =  grow(p,  7).  We  can  compute  Agr0w  in  time 
Os(|P||A|(|P||r|  +  |A|)if)7  and  it  has  at  most  |P||r|  +  |A|  states. 

Proof.  The  proof  is  similar  to  the  one  given  for  Lem.  6.4.1.  Let  /3  <fL  T  be  a  new  stack  symbol. 
Let  Afjf  be  the  automaton  A  with  an  extra  state  qf  and  an  extra  transition  (gp,7,  (3,  qf).  Let 
qf  be  the  final  state  of  this  automaton.  A^f  represents  the  configuration  set  {(p,  7  /?)}.  The 
automaton  returned  by  poststar(Ag 7)  would  then  represent  the  configuration  set  grow(p,  7) 
with  /3  appended  at  the  end  of  the  stack.  The  proof  follows  from  the  fact  that  running 
poststar  on  A is  the  same  as  running  it  on  A  (for  all  p  and  7)  with  the  exception  of  the 
extra  /3-transition.  □ 

The  advantage  of  the  construction  presented  in  Lem.  6.4.2  is  that  it  just  requires  a  single 
poststar  query  to  compute  all  of  the  APtl,  instead  of  one  query  for  each  p  G  P  and  7  G  T. 
Because  the  standard  poststar  algorithm  builds  an  automaton  that  is  larger  than  the  input 
automaton  (Lem.  2.4.11),  Agr0w  has  many  fewer  states  than  those  in  all  of  the  individual 
AP}1  automata  put  together. 

Fig.  6.4(6)  and  (c)  show  the  Agrow  and  Apop  automata  for  a  simple  WPDS  constructed 
over  the  minpath  semiring  (Defn.  2.4.14). 

The  idea  behind  our  approach  is  to  use  Apop  to  simulate  the  first  phase  where  the  PDS 
pops  off  stack  symbols.  With  reference  to  Fig.  6.3,  the  transducer  consumes  71  •  •  -7*.  from 
the  input  tape.  When  the  transducer  (non-deterministically)  decides  to  switch  over  to  the 
growth  phase,  and  is  in  state  pk+i  in  Apop  with  7^+1  being  the  next  symbol  in  the  input, 
it  passes  control  to  APk+1^k+1  to  start  generating  the  output  u \  ■  ■  ■  Uj .  Then  it  moves  into 
an  accept  phase  where  it  copies  the  untouched  part  of  the  input  stack  ( 7fc+2  •  •  •  7 n)  to  the 
output. 

This  can  be  optimized  by  avoiding  a  separate  copy  of  Apn  for  each  7.  Let  Ap  be  the  same 
as  Agrow,  but  with  final  states  restricted  to  {qpr/  |  7  G  T },  and  unreachable  states  appropri¬ 
ately  pruned  (see  Fig.  6.4(d)  and  (e)).  The  transducer  we  construct  will  non-deterministically 
guess  the  stack  symbol  7  from  which  the  growth  phase  starts,  pass  control  to  Ap,  and  then 
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Figure  6.4  Weighted  transducer  construction:  (a)  A  simple  WPDS  with  the  minpath 
semiring.  (6)  The  Agrow  automaton.  Edges  are  labeled  with  their  stack  symbol  and  weight. 

(c)  The  Apop  automaton,  (d)  The  AVl  automaton  obtained  from  Agr0w  (e)  The  AP2 
automaton  obtained  from  Agr0w  The  unnamed  state  in  (c)  and  (d)  is  an  extra  state  added 
by  the  poststar  algorithm  used  in  Lem.  6.4.2.  (/)  The  weighted  transducer.  The  boxes 
represent  “copies”  of  Apop,  AP1  and  AP2  as  required  by  steps  2  and  3  of  the  construction. 
The  transducer  paths  that  accept  input  {p\  a)  and  output  (p2  bn),  for  n  >  2,  with  weight  n 

are  highlighted  in  bold. 


verify  that  the  guess  was  correct  when  it  reaches  the  final  state  qpr/  in  Ap.  As  a  result,  we 
just  need  \P\  copies  of  Agr0w 

Note  that  Apop  is  a  forward-weighted  automaton,  whereas  Agr0w  is  a  backward-weighted 
automaton.  Therefore,  when  we  mix  them  together  to  build  a  transducer,  we  must  allow  it  to 
switch  directions  for  computing  the  weight  of  a  path.  Consider  Fig.  6.3;  a  PDS  rule  sequence 
consumes  the  input  configuration  from  left  to  right  (in  the  pop  phase),  but  produces  the 
output  stack  configuration  u  from  right  to  left  (as  it  pushes  symbols  on  the  stack).  Because 
we  need  the  transducer  to  output  U\  ■  ■  -Uj  from  left  to  right,  we  need  to  switch  directions 
for  computing  the  weight  of  a  path.  For  this,  we  define  partitioned  transducers. 
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Definition  6.4.3.  A  partitioned  weighted  finite-state  transducer  t  is  a  tuple 
(Q,{Qi}i=1,S,Y,i,Y,0,\,I,F)  where  Q  is  a  finite  set  of  states,  {Qi,Q2}  is  a  partition  of 
Q,  S  =  (D,  ©,  <g),  0, 1)  is  a  bounded  idennpotent  semiring,  £ t  and  £0  are  input  and  output 
alphabets,  X  C  Q  x  D  x  (£;  U  {e})  x  (£0  U  {e})  x  Q  is  the  transition  relation,  I  C  Ql  is  the 
set  of  initial  states,  and  F  C  Q2  is  the  set  of  final  states.  We  restrict  the  transitions  that 
cross  the  state  partition:  if  (q,  w,  a,  b ,  q')  G  A  and  q  G  Qi,  q'  G  Qk  and  l  k,  then  l  —  1,  k  —  2 
and  w  —  1.  Given  a  state  q  G  I,  the  transducer  accepts  a  string  a,  G  £*  with  output  aQ  G  £* 
if  there  is  a  path  from  state  q  to  a  final  state  that  takes  input  a,  and  outputs  o0. 

For  a  path  rj  that  goes  through  states  qi,  ■  ■  ■  ,  qm,  such  that  the  weight  of  the  ith  transition  is 
Wi,  and  all  states  qi  are  in  Qj  for  some  j,  then  the  weight  of  this  path  7/(77)  is  w\®w2®-  ■  ■ ®wm 
if  j  —  1  and  wm  ®  wm-  i  ®  •  •  •  <8>  W\  if  j  =  2,  i.e.,  the  state  partition  determines  the  direction 
in  which  we  perform  extend.  For  a  path  rj  that  crosses  partitions,  i.e.,  rj  =  771/72  such  that 
each  rjj  is  a  path  entirely  inside  Qj,  then  7/(77)  =  vi.Vi)  ®  7/(772). 

In  this  chapter,  we  refer  to  partitioned  weighted  transducers  as  weighted  transducers,  or 
simply  transducers  when  there  is  no  possibility  of  confusion.  Note  that  when  the  extend 
operator  is  commutative,  partitioning  is  unnecessary. 

Let  St  (A)  denote  the  set  of  states  of  an  automaton  A.  Because  each  of  Apop  and  Ap  have 
P  as  a  subset  of  their  set  of  states,  we  distinguish  them  by  referring  to  a  state  q  G  St(Apop) 
by  qpop  and  q  G  St(Ap)  by  qp. 

Given  a  WPDS  IV,  we  construct  the  desired  weighted  transducer  Tyv  using  the  steps 
given  below.  Tyv  has  states  {<27,  <?/}  U St(Apop)  U  (UPeP  St(Ap)),  input  alphabet  PUT,  output 
alphabet  PUT,  weight  domain  the  same  as  IV,  initial  state  g*,  and  final  state  qj.  Its  state 
partition  is  Qi  =  {g*}  USt(Apop)  and  Q2  =  {qf}G({Jp&pSt(Ap)).  The  part  of  the  transducer 
contained  in  Q\  simulates  the  pop  phase,  and  the  part  contained  in  Q2  simulates  the  growth 
phase,  including  the  part  where  the  untouched  part  of  the  stack  is  copied  to  the  output  tape. 
Transitions  to  Tyv  are  added  as  follows  (an  example  is  given  in  Fig.  6.4): 

1.  For  each  state  pGP,  add  the  transition  (qi,p/s,ppop)  with  weight  1  to  Tyv- 
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2.  For  each  transition  {plop,  J ,  p2pop)  with  weight  w  in  Apop  add  the  transition 
{plop,  (7 /e),Pp0p)  with  the  same  weight  to  rw,  he.,  copy  over  Apop. 

3.  For  each  transition  (qp,  j',q'p)  in  each  automaton  Ap  add  the  transition  (qp,  (, e/j'),q'p ) 
with  the  same  weight  to  rw,  i.e.,  copy  over  each  of  the  Ap. 

4.  For  each  q,  qf  G  P,  add  the  transition  (qpop,  {e/q'),  qp)  with  weight  1  to  rw-  This 
transition  permits  a  switch  from  the  pop  phase  to  the  growth  phase.  At  this  point, 
we  just  know  that  the  growth  phase  begins  in  state  q  and  ends  in  state  q'.  This  step 
guesses  the  stack  symbol  from  which  the  growth  phase  starts.  The  next  step  verifies 
that  our  guess  was  correct. 

5.  For  each  final  state  qpa  G  St(Ap),  add  the  transition  (qPi7,  (,y/e),qf)  with  weight  1  to 
rw-  This  transition  verifies  that  7  was  on  the  input  tape,  and  we  just  completed  the 
growth  phase  starting  from  7. 

6.  For  each  p,q  G  P,  add  the  transition  (qp,  {e/e),  q/)  with  weight  1  to  rw-  This  transition 
allows  us  to  skip  the  growth  phase  by  going  directly  to  the  final  state. 

7.  For  each  7  G  T,  add  the  transition  (qy,  (7/7 ),<?/)  with  weight  1  to  rw-  This  part  of  the 
transducer  copies  over  the  untouched  part  of  the  input  tape  to  the  output  tape. 

Theorem  6.4.4.  When  the  transducer  rw,  as  constructed  above,  is  given  input  (p  z), 
p  G  P,  z  G  T* ,  then  the  combine  over  the  values  of  all  paths  in  rw  that  output  the  string 
(p'  z')  is  precisely  IJOP({(p,  z)},  {(p' ,  z')}).  Moreover,  this  transducer  can  be  constructed 
in  time  Os(|P||A|(|P||r|  +  |A|)P),  has  at  most  |P|2|r|  +  |P||A|  states  and  at  most  |P|2|A|2 
transitions. 

Proof.  The  proof  is  based  on  the  observation  made  in  Fig.  6.3.  Suppose  that  we  have 
a  path  in  the  PDS  transition  relation  from  (p,  7172  •  •  •  yn)  to  (pk+i,  r<7fc+2  •  •  •  7n)  that  can 
be  broken  down  as  shown  in  Fig.  6.5.  Then  in  the  transducer,  we  can  take  the  path 
starting  at  q*  that  first  takes  the  transition  {qi,{p/e),ppop)  (Step  1  of  the  construction) 
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iP,  7172  ■■'In)  => * 

(Pl,72-"7n) 

Wi 

(P2,  73  •  •  •  In) 

w2 

^ * 

^ * 

( Pk)  7fc+l  '  '  ‘  7 n) 

Wk 

=> * 

{Pk+ 1;  'W~)k+2  ’  ’  ‘  7n) 

Wk+l 

Figure  6.5  A  path  in  the  PDS’s  transition  relation  with  corresponding  weights  of  each  step. 

and  moves  into  state  p  of  Apop.  Then  it  successively  takes  the  transitions  (pi,  ('y2/£),P2), 
(. P2 ,  (73/ ^); Ps)  ■>  •  •  •  ,  (Pk-1,  (7 k/e),Pk)  (Step  2),  all  the  time  staying  inside  Apop.  If  the  weight 
of  the  ith  such  transition  is  w\  then  «y  jZ  wl.  This  follows  from  Lem.  6.4.1.  Next,  the  trans¬ 
ducer  can  take  transition  ( pk ,  (e  /  pk+i) ,  Pk+i)  (Step  4)  and  move  into  APk.  Then  it  can  take 
a  path  that  outputs  u  and  move  into  state  qPk,lk+1-  There  is  one  such  path  because  APk  can 
accept  u  starting  in  state  pk+i  (representing  the  configuration  (pk+i,u))  when  the  final  state 
is  qPk,lk+1  (Lem.  6.4.2).  Moreover,  Wk+i  Z  the  combine  of  weights  of  all  such  paths  in  the 
transducer.  After  this,  the  transducer  can  take  transition  (qPk+1,lk+1,  (qfc+i/e),  qf)  (Step  5) 
and  copy  the  stack  (qfc+2  •  •  •  7 n)  on  to  the  output  tape  in  the  final  state  qf  (Step  7).  The  path 
we  just  described  took  input  (p,  z)  =  (p  7172  •  •  •  qn)  and  output  (p',  z ')  =  (pk+i  ujk+2  •  •  •  7n) 
as  required,  and  the  weight  of  the  path  shown  in  Fig.  6.5  (uy  ®  w2  •  •  -<S)Wk+ 1)  is  C  combine 
of  weights  of  all  paths  in  the  transducer  with  this  behavior.  Note  that  there  is  a  correspond¬ 
ing  path  in  the  transducer  (that  uses  transitions  inserted  in  Step  6)  when  the  path  shown  in 
Fig.  6.5  has  no  growth  phase.  Thus,  IJOP((p,  z),  ( p',z '))  C  Tyv((p  z),  ( p '  z')). 

To  argue  the  other  direction,  the  reasoning  is  similar.  A  path  in  the  transducer  must 
start  in  state  q^  then  move  into  Apop)  then  into  Ap  (for  some  p  G  P)  and  then  move  to  state 
qj.  Keeping  track  of  the  input  and  output  required  for  this  path,  we  can  build  the  WPDS 
path  as  in  Fig.  6.5.  Using  Lemmas  6.4.1  and  6.4.2,  IJOP((p,  z),  (p',  z'))  □  the  weight  of  such 
a  path  in  the  transducer.  Thus,  IJOP((p,  z),  ( p',z '))  □  7yy ( (p  z),  ( p'  z')).  □ 
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Usually  the  WPDSs  used  for  modeling  programs  have  \P\  =  1  and  |T|  <  |A|.  In  that 
case,  constructing  a  transducer  has  similar  complexity  and  size  as  running  a  single  poststar 
query. 


6.5  Composing  Weighted  Transducers 

Composition  of  unweighted  transducers  is  straightforward,  but  this  is  not  the  case 
with  weighted  transducers.  For  a  weighted  transducer  r,  let  £(r)  be  a  weighted  relation 
(Defn.  2.4.15)  that  contains  (ci,c0,w)  if  and  only  if  w  is  the  combine  of  the  weights  of  all 
paths  in  r  that  take  input  q  and  output  cQ. 

Composition  of  weighted  transducers  is  defined  as  follows:  given  two  weighted  transducers 
Ti  and  72,  construct  another  weighted  transducer  73  such  that  £(73)  =  £(77);  £(77),  where 
the  composition  operator  denotes  composition  of  weighted  relations  (Defn.  2.4.15). 

We  begin  with  a  slightly  simpler  problem  on  weighted  automata.  The  machinery  that 
we  develop  for  this  problem  will  be  used  for  composing  weighted  transducers. 

6.5.1  The  Sequential  Product  of  Two  Weighted  Automata 

Given  forward- weighted  automata  Ai  and  A2,  we  define  their  sequential  product  as  an¬ 
other  weighted  automaton  A3  such  that  for  any  configuration  c,  .4.3(c)  =  Ai(c)  <8>  .42(c). 
More  generally,  we  want  the  following  identity  for  any  regular  set  of  configurations  C: 
.43(C)  =  ©{.43(c)  |  c  G  C}  =  ©{Mi(c)  (8)  M.2(c)  |  c  G  C}.  (In  this  section,  we  assume 
that  configurations  consist  of  just  the  stack  and  \P\  =  1.)  This  problem  is  the  special  case 
of  transducer  composition  when  a  transducer  only  has  transitions  of  the  form  (7/7).  For  the 
Boolean  weight  domain  (Defn.  2.4.12),  it  reduces  to  unweighted  automaton  intersection  (with 
words  accepted  with  weight  0  being  considered  as  words  not  accepted  by  the  automaton). 

Note  that  this  is  a  different  version  of  the  weighted-automaton  intersection  problem  that 
was  solved  for  computing  an  error  projection  (Section  5.2).  For  computing  error  projections, 
we  only  needed  to  take  the  sequential  product  of  a  forward-weighted  automaton  (obtained 
as  a  result  of  running  poststar)  with  a  backward-weighted  automaton  (obtained  as  a  result  of 
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d,  w3  d,  w6  d,  (w3,w6) 

Figure  6.6  Forward-weighted  automata.  Their  final  states  are  qi,q-2,  and  (gi,g2), 

respectively. 

running  prestar ).  Moreover,  the  result  of  this  operation  was  a  functional  automaton,  which 
is  not  a  weighted  automaton.  In  this  section,  we  work  with  same-direction  automata,  and 
the  resulting  automaton  M3  that  we  compute  is  still  a  valid  weighted  automaton  (but  with  a 
different  weight  domain),  thus  enabling  us  to  take  its  sequential  product  with  other  weighted 
automata  (over  the  same  weight  domain).  The  solution  in  this  section  is  more  general  than 
the  one  presented  earlier  in  Section  5.2.  We  show  later,  in  Section  6.5.3,  that  this  technique 
also  generalizes  to  take  the  sequential  production  of  automata  with  different  directions. 

To  take  the  sequential  product  of  weighted  automata,  we  start  with  the  algorithm  for 
intersecting  unweighted  automata.  This  is  done  by  taking  transitions  (qq,  7,  q2)  and  ( q[ ,  7,  q'2 ) 
in  the  respective  automata  to  produce  ((gq,  q[),  7,  (g2,  q2))  in  the  new  automaton.  We  would 
like  to  do  the  same  with  weighted  transitions:  given  weights  of  the  matching  transitions,  we 
want  to  compute  a  weight  for  the  created  transition.  In  Fig.  6.6,  intersecting  automata  Mi 
and  A2  produces  A3  (ignore  the  weights  for  now).  Automaton  A3  should  accept  (a  b)  with 
weight  Mi  (a  b)  g)  A2(  a  b)  =  w\  g  w2  g  uq  g  w5. 

One  way  of  achieving  this  is  to  pair  the  weights  while  intersecting  (as  shown  in  M3  in 
Fig.  6.6).  Matching  the  transitions  with  weights  W\  and  uq  produces  a  transition  with 
weight  (uq,uq).  For  reading  off  weights,  we  need  to  define  operations  on  paired  weights. 
Define  extend  on  pairs  (gp)  to  be  componentwise  extend  (g).  Then  Ms(a  b)  =  (uq,uq)  gp 
(w2,  uq)  =  (uq  guq,  uqguq).  Taking  an  extend  of  the  two  components  produces  the  desired 
answer.  Thus,  this  M3  together  with  a  read-out  operation  in  the  end  (that  maps  a  weight 
pair  to  a  weight)  is  a  first  attempt  at  constructing  the  sequential  product  of  Mi  and  A2. 

Because  the  number  of  accepting  paths  in  an  automaton  may  be  infinite,  one  also 
needs  a  combine  (®p)  on  paired  weights.  The  natural  attempt  is  to  define  it  componen¬ 
twise.  However,  this  is  not  precise.  For  example,  if  C  —  {ci,c2}  then  Ms(C)  should  be 
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(^i(ci)  (8)  j\~2 (ci) )  ©  (Uli(c2)  ©  A-2 (c2) ) •  However,  using  componentwise  combine,  we  would 
get  .4.3(C)  =  «A3(ci)  ©p  “43(c2)  =  (^i(ci)  ©  ©li(c2),.42(ci)  ©  ©l2(c2)).  Applying  the  read-out 
operation  (extend  of  the  components)  gives  four  terms  0{(“4i(ci)  ©  ©l2(cj))  |  1  <  i,  j  <  2}, 
which  includes  cross  terms  like  Aficfi  ©  ©l2(c2).  The  same  problem  arises  also  for  a  single 
configuration  c  if  ©l3  has  multiple  accepting  paths  for  it. 

Under  certain  circumstances  there  is  an  alternative  to  pairing  that  lets  us  compute  pre¬ 
cisely  the  desired  sequential  product  of  weighted  automata: 

Definition  6.5.1.  The  nth  sequentializable  tensor  product  ( n-STP )  of  a  weight  do¬ 
main  S  =  (D,©,©,0, 1)  is  defined  as  another  weight  domain  St  =  (Dt,  ©t,  ©t,  0t,  lt)  with 
operations  0  :  Dn  — >  Dt  (called  the  tensor  operation)  and  DeTensor :  Dt  —>  D  such  that  for 
all  Wj,w'j  E  D  and  ti,t2  £  Dt, 

1.  Q(wi,w2,  •  •  •  ,  wn)  ©j  Q(w[,w'2,  •  •  •  ,  w’n )  =  Q(w  1  ©  w\ ,  w2®w'2,---  ,  wn  ©  w'n ) 

2.  DeTensor(G>(wi,  w2,  •  •  •  ,  wn ))  =  (wi  ©  w2  ©  •  •  •  ©  wn )  and 

3.  DeTensor(ti  ©t  f2)  =  DeTensofitfi)  ©  DeTensor (t2). 

When  n  =  2,  we  write  the  tensor  operator  as  an  infix  operator.  Note  that  because  of  the 
first  condition  in  the  above  definition,  lt  =  ©(1,  •  •  •  ,1)  and  =  0(0,  •  •  •  ,0).  Intuitively, 
one  may  think  of  the  tensor  product  of  i  weights  as  a  kind  of  generalized  ©tuple  of  those 
weights.  The  first  condition  above  implies  that  extend  of  weight-tuples  must  be  carried  out 
componentwise.  The  DeTensor  operation  is  the  “read-out”  operation  that  puts  together 
the  tensor  product  by  taking  extend  of  its  components.  The  third  condition  is  the  key. 
It  distinguishes  the  tensor  product  from  a  simple  tupling  operation.  It  requires  that  the 
DeTensor  operation  distribute  over  the  combine  of  the  tensored  domain,  which  pairing  does 
not  satisfy. 

If  a  2-STP  exists  for  a  weight  domain,  then  we  can  take  the  product  of  weighted  au¬ 
tomata  for  that  domain:  if  A\  and  A2  are  the  two  input  automata,  then  for  each  tran¬ 
sition  (pi,J,qi)  with  weight  w±  in  A l,  and  transition  (p2, 7,(72)  with  weight  W2  in  A2, 
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add  the  transition  ((pi,P2),  7,  (91,  92))  with  weight  (wi  ©  w2)  to  A3.  The  resulting  au¬ 
tomaton  satisfies  the  property:  DeTensor(A3(c ))  =  Ai(c)  ©  A2(c),  and  more  generally, 
DeTensor(A3(C ))  =  ®{Al(c)  ©*4.2(c)  |  c  G  C}.  Thus,  with  the  application  of  the  DeTensor 
operation,  A3  behaves  like  the  desired  automaton  for  the  product  of  A\  and  4l2.  A  similar 
construction  and  proof  hold  for  taking  the  product  of  n  automata  at  the  same  time,  when 
an  n-STP  exists. 

The  proof  follows  from  the  definitions.  Let  accPath(Ai,  au,  w)  be  a  predicate  that  denotes 
that  cru  is  a  path  in  At  from  its  initial  state  to  a  final  state  that  accepts  the  word  u,  and  w 
is  the  weight  of  the  path  (computed  by  performing  extends  of  weights  on  transitions  in  the 
path,  in  order).  Because  of  the  way  the  automata-intersection  algorithm  is  carried  out,  we 
know  that  paths  that  accept  a  word  u  in  A3  are  in  one-to-one  correspondence  with  paths 
that  accept  u  in  A\  and  paths  that  accept  u  in  A2.  If  alu  is  an  accepting  path  for  u  in  Ai 
(i  =  1,2),  then  we  can  uniquely  determine  an  accepting  path  (cr*,  cr^)  for  u  in  A3,  and  vice 
versa.  These  properties  can  be  used  to  prove  that  if  accPaf/j(^3,  (cr*,  cr^),  w)  holds,  then 
w  =  wiQ  w2  such  that  accPath(Ai ,  clu,Wi)  hold  for  i  =  1,  2.  This  gives  us: 

De  Tensor(A3  (C)) 

=  DeTensor((Bt{vJ  \  accPath(A3,ac,w),c  G  C}) 

=  ®{DeTensor(w)  \  accPath(A3}crc,w),c  G  C} 

=  ®{DeTensor(wi  0  w2)  j  accPath(Ai ,  er*,  Wi),  cGC, 

i  =  1,  2,  crc  =  (©),©:)} 

=  ®{wi  ©  w2  |  accPath(Ai ,  alc,  Wi ),  c  G  C,  i  —  1, 2} 

—  ©{*4i (c)  ©  A-2(c )  |  c  G  C} 

With  the  application  of  the  DeTensor  operation  at  the  end,  ^3  behaves  like  the  desired 
automaton  for  the  product  of  ^4i  and  A2.  A  similar  construction  and  proof  hold  for  taking 
the  product  of  n  automata  at  the  same  time,  when  an  n-STP  exists. 

Before  generalizing  to  composition  of  transducers,  we  show  that  n-STP  exists,  for  all  n, 
for  a  class  of  weight  domains.  This  class  includes  the  one  needed  to  perform  affine-relation 
analysis  (Section  3.5.2). 
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6.5.2  Sequentializable  Tensor  Product 

We  say  that  a  weight  domain  is  commutative  if  its  extend  is  commutative.  STP  is  easy  to 
construct  for  commutative  domains  (tensor  is  extend,  and  DeTensor  is  identity).  This  result 
is  somewhat  expected:  the  difficulty  in  taking  the  sequential  product  of  weighted  automata 
A\  and  A2  is  that  while  the  input  word  (or  configuration)  is  read  synchronously  by  them, 
their  weights  have  to  be  read  off  in  sequence  (first  Ai(c),  then  .4.2(c))).  When  extend  is 
commutative,  the  weights  can  be  read  off  synchronously  as  well. 

However,  commutative  domains  are  not  useful  for  CBA.  Under  a  commutative  extend, 
interference  from  other  threads  can  have  no  effect  on  the  execution  of  a  thread.  However, 
such  domains  still  play  an  important  role  in  constructing  STPs.  We  show  that  STPs  can  be 
constructed  for  matrix  domains  built  on  top  of  a  commutative  domain. 

Definition  6.5.2.  Let  Sc  =  (Dc,  ©c,  <©c,  0C,  lc)  be  a  commutative  weight  domain.  Then  a 
matrix  weight  domain  on  Sc  of  order  n  is  a  weight  domain  S  =  ( D ,  ©,  ©,  0, 1)  such  that 
D  is  the  set  of  all  matrices  of  size  n  x  n  with  elements  from  Dc;  ©  on  matrices  is  element¬ 
wise  ©c;  ©  of  matrices  is  matrix  multiplication;  0  is  the  matrix  in  which  all  elements  are  0C; 
1  is  the  identity  matrix  (Tc  on  the  primary  diagonal  and  0C  everywhere  else). 

The  reader  can  verify  that  S ,  as  defined  above,  is  indeed  a  bounded  idempotent  semiring 
(even  when  Sc  is  not  commutative).  Let  B  be  the  Boolean  weight  domain  with  elements 
lj3  and  Ob.  The  relational  weight  domain  (Defn.  2.4.13)  on  a  set  G  =  {gi,  5©  • ' '  ,  <7|g|}, 
is  a  matrix  weight  domain  on  B  of  order  \G\:  a  binary  relation  on  G  can  be  represented 
as  a  matrix  such  that  the  (i,j)  entry  of  the  matrix  is  1#  if  and  only  if  (g,,  gf)  is  in  the 
relation.  Relational  composition  then  corresponds  to  matrix  multiplication.  Similarly,  the 
relational  weight  domain  on  ( G,SC )  (Defn.  2.4.16)  is  a  matrix  weight  domain  on  Sc  of  order 
|G|,  provided  Sc  is  commutative. 
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The  advantage  of  looking  at  weights  as  matrices  is  that  it  gives  us  essential  structure  to 
manipulate  for  constructing  the  STP.  We  need  the  following  operation  on  matrices:  the  Kro- 
necker  product  [95]  of  two  matrices  A  and  B,  of  sizes  ri\  x  n2  and  n:i  x  n4,  respectively,  is  a  ma¬ 
trix  C  of  size  (ni  n3)  x  (rt2  n4)  such  that  C(i,j )  =  A{i  div  n3,j  div  n4)  ( i  mod  n3,  j  mod  n4) , 

where  matrix  indices  start  from  zero,  and  “div”  is  integer  division.  It  is  much  easier  to  un¬ 
derstand  this  definition  pictorially  (writing  A(z,j )  as  atJ ) : 


( 


C  = 


aooB 


^0(712-1)-® 


y  ^(ni  —  1)qB  *  *  *  —  —  J 


The  Kronecker  product,  written  as  0,  is  an  associative  operation.  Moreover,  it  is  well 
known  that  for  matrices  A,  B,  C ,  D  with  elements  that  have  commutative  multiplication, 
(AoB)®(CoD)  =  (A®C)Q(B®  D ). 

Note  that  the  Kronecker  product  m  =  rn  y  0  m2  has  all  pairwise  products  of  elements 
from  rri\  and  m2.  We  will  rearrange  the  entries  of  m4  0m2,  using  only  linear  transformations 
to  obtain  rri  \  0  m2.  Let  rn \  and  m2  be  matrices  of  size  k  x  k,  so  that  m  is  of  size  k2  x  k2. 

Let  py:i)  be  a  matrix  of  size  k  x  k2  such  that  all  of  its  entries  are  0,  except  for  the 
(/,i)th  entry,  which  is  1.  Let  (-l{hr)  be  a  matrix  of  size  k2  x  k  such  that  all  of  its  entries  are 
0,  except  for  the  (j,  r)th  entry,  which  is  1.  Then  the  matrix  =  (ppj)  rn  q(j,r)),  where 

juxtaposition  denotes  matrix  multiplication,  selects  the  (i,j)th  entry  of  m  and  moves  it  to  the 
(/,  r)th  entry  of  a  kx  k  matrix.  All  other  entries  of  the  resultant  matrix  are  0.  Moreover,  the 
transformation  from  m  to  is  linear,  i.e. ,  it  distributes  over  matrix  addition  (combine). 

Let  7Ln  =  {0, 1,  •  •  •  ,  n  —  1}.  Let  A  be  a  subset  of  (Zfc2  x  Zfc2)  x  (Zfc  x  Zfe).  We  define  S  to 
map  from  the  index  of  an  entry  in  m  =  m4  0  m2  to  its  position  in  the  product  rn 4  0  m2,  if  it 
exists.  For  instance,  m(2k,  k  —  1)  =  m4( 2, 0)  0  m2(0,  k  —  1),  which  contributes  to  the  entry 
(m40m2)(2,  k—  1),  i.e.,  it  is  one  of  the  summands  in  the  sum  that  defines  (m40m2)( 2,  k  —  1). 
Thus,  we  include  ((2fc,  k  1),  (2,  k  —  1))  in  S.  Formally,  S  is  defined  as  follows: 
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{((i,  j),  (l,r))  |  l  =  (■ idivk),r  =  (jmodk),  and  (j  div k  =  imodfc)} 

We  are  now  ready  to  define  the  DeTensor  operation.  For  any  matrix  m,  define  the 
expression  9m  to  be  the  following: 


9m  =  ©  rnf^ 

By  construction,  9miQrri2  =  (mi  ©  m2).  This  can  be  generalized  to  multiple  matrices  to 
obtain  an  expression  9m  of  the  same  form  as  above,  such  that  9m i©...©m„  =  mi  0  •  •  •  ®  mn. 
The  advantage  of  having  an  expression  of  this  form  is  that  9mi<Sm2  =  9mi  ©  9m<1  (because 
matrix  multiplication  distributes  over  their  addition,  or  combine). 

Theorem  6.5.3.  A  n-STP  exists  on  matrix  domains  for  all  n.  If  S  is  a  matrix  domain  of 
order  r,  then  its  n-STP  is  a  matrix  domain  of  order  rn  with  the  following  operations:  the 
tensor  product  of  weights  is  defined  as  their  Kronecker  product,  and  the  DeTensor  operation 
is  defined  as  \m.9m. 

The  necessary  properties  for  the  tensor  operation  follow  from  those  for  Kronecker  product 
and  the  expression  9m.  Commutativity  of  the  underlying  semiring  is  needed  to  show  property 
1  of  Defn.  6.5.1:  it  is  necessary  to  rearrange  a  product  (uq  <©c  w[  <©c  w2  ©c  «4)  as  (Vi 
w2  ©c  w\  ®cw'2).  This  also  implies  that  the  tensor  operation  is  associative  and  one  can  build 
weights  in  the  nth  STP  from  a  weight  in  the  (n  —  l)th  STP  and  the  original  matrix  weight 
domain  by  taking  the  Kronecker  product.  This,  in  turn,  implies  that  the  sequential  product 
of  n  automata  can  be  built  from  that  of  the  first  (n  —  1)  automata  and  the  last  automaton. 
The  same  holds  for  composing  n  transducers.  Therefore,  the  context-bound  can  be  increased 

incrementally,  and  the  transducer  constructed  for  (=^ec)fc  can  be  used  to  construct  one  for 
(=^>ec)fc+! 

The  weight  domain  for  ARA  (Section  3.5.2)  is  not  quite  a  matrix  weight  domain,  but 
it  is  similar.  The  weights  are  sets  of  matrices  over  integers,  which  have  a  commutative 
multiplication.  Extend  is  elementwise  matrix  multiplication  and  combine  is  elementwise 
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matrix  addition.  Therefore,  defining  the  tensor  and  De Tensor  operations  as  for  the  matrix 
domains  (but  elementwise),  we  obtain  most  of  the  desired  properties.  However,  just  as  for 
interprocedural  ARA  one  needed  to  prove  two  properties  to  show  that  combine  and  extend 
can  be  carried  out  on  the  basis  instead  of  the  whole  vector  space,  one  needs  to  prove  the 
same  for  tensor  and  De  Tensor,  for  weights  wi,w2, 

(3{wi  ©  w2)  =  P(P{wi)  ®(3{w2)) 

/3(DeTensor(wi))  =  / 3(DeTensor{(3{w\ ))) 

These  properties  follow  quite  trivially  from  the  linearity  of  Kronecker  product  and  the 
DeTensor  operator  (both  distribute  over  addition). 

6.5.3  Composing  Transducers 

If  our  weighted  transducers  were  unidirectional  (completely  forwards  or  completely  back¬ 
wards)  then  composing  them  would  be  the  same  as  taking  the  product  of  weighted  automata: 
the  weights  on  matching  transitions  would  get  tensored  together.  However,  our  transduc¬ 
ers  are  partitioned,  and  have  both  a  forwards  component  and  a  backwards  component.  To 
handle  the  partitioning,  we  need  additional  operations  on  weights. 

Definition  6.5.4.  Let  S  =  (. D ,  ©,  ®,  0, 1)  be  a  weight  domain.  Then  a  transpose  operation 
on  this  domain  is  defined  as  (. )T  :  D  — >  D  such  that  for  all  uq,  w2  G  D,  w\  ©wj  =  {w2®Wi)t 
and  it  is  its  self  inverse:  (wf)1  =  w\.  An  n-transposable  STP  (TSTP)  on  S  is  defined 
as  an  n-STP  along  with  another  de-tensor-like  operation:  TDeTensor  :  Dn  — >  D  such  that 
TDeTensor(o(wi,  w2,  •  •  •  ,  wn))  =  w i  (g>  wf  ®  w3  <g)  wj  (%)■■■  w'n,  where  w'n  =  wn  if  n  is  odd 
and  wf  if  n  is  even. 

TSTPs  always  exist  for  matrix  domains:  the  transpose  operation  is  just  the  matrix- 
transpose  operation,  and  the  TDeTensor  operation  can  be  defined  using  an  expression  similar 
to  that  for  DeTensor.  We  can  use  TSTPs  to  remove  the  partitioning.  Let  r  be  a  partitioned 
weighted  transducer  on  S ,  for  which  a  transpose  exists,  as  well  as  a  2-TSTP.  The  partitioning 
on  the  states  of  r  naturally  defines  a  partitioning  on  its  transitions  as  well  (a  transition  is 
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said  to  belong  to  the  partition  of  its  source  state).  Replace  weights  W\  in  the  first  (forwards) 
partition  with  {w\  ©  1),  and  weights  W2  in  the  second  (backwards)  partition  with  (1  0  wj). 
This  gives  a  completely  forwards  transducer  r'  (without  any  partitioning).  The  invariant  is 
that  for  any  sets  of  configurations  S  and  T,  t(S,T),  which  is  the  combine  over  all  weights 
with  which  the  transducer  accepts  ( s,t),s  e  S,  £  GT,  equals  TDeTensor(r'(S,T)). 

This  can  be  extended  to  compose  partitioned  weighted  transducers.  Composing  n  trans¬ 
ducers  requires  a  2n-TSTP.  First,  each  transducer  is  converted  to  a  non-partitioned  one 
over  the  2-TSTP  domain.  Then  input/output  labels  are  matched  just  as  for  unweighted 
transducers,  and  the  weights  are  tensored  together. 

Theorem  6.5.5.  Given  n  weighted  transducers  Ti ,  •  •  •  ,  rn  on  a  weight  domain  with  2 n- 
TSTP,  the  above  construction  produces  a  weighted  transducer  r  such  that  for  any  sets  of 
configurations  S  and  T ,  TDeTensor(r(S,  T))  =  R(S,  T ),  where  R  is  the  weighted  composition 
of  Cin),---  ,C(rn). 

Putting  it  all  together 

Using  the  construction  from  Section  6.4,  we  can  construct  a  transducer  r*  for  the 
(weighted)  transition  relation  of  thread  £*,  i.e.,  for  =>*.  By  extending  t*  to  perform  the 
identity  transformation  on  stack  symbols  of  threads  other  than  t,t  (using  transitions  of  the 
form  p  -  >  p  with  weight  1),  we  obtain  a  transducer  rf  for  (=>?)*.  Next,  a  union  of  these 
transducers  gives  rec,  which  represents  =yec.  Performing  the  weighted  composition  of  rec  k 
times  with  itself  gives  us  a  transducer  r  that  represents  (=^ec)fc+1. 

If  automaton  As  represents  the  set  of  starting  states  of  a  program,  r(As)  provides  a 
weighted  automaton  A  that  describes  all  reachable  states  (under  the  context  bound),  i.e., 
the  weight  Aft)  gives  the  net  transformation  in  data  state  in  going  from  S'  to  £  (0  if  £  is  not 
reachable). 

For  instance,  to  see  how  all  this  works  out,  consider  a  concurrent  program  with  two 
threads.  Furthermore,  suppose  that  the  WPDSs  for  the  two  threads  have  a  single  PDS 
control  state  p.  In  this  case,  the  composition  (rf;rf)  represents  all  behaviors  with  one 
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context  switch,  in  which  t\  executes  before  t2.  We  know  that  rf  ((p,  ci,  c2),  (p,  cf,  c2))  = 
IJOPtl({(p,  ci)},  {(p,  c))}),  and  similarly  for  rf.  Next,  the  transducer  (rf;rf)  accepts  the 
composition  of  the  weighted  languages  of  rf  and  r|.  Thus, 


(rie;  hDO,  ci,  c2),  (p,  ci,  c2))  =  IJOPtl({(p,  ci)},  {(p,  c})})  <g>  IJOPi2({(p,  c2)},  {(p,  c'2)}) 

which  exactly  characterizes  the  set  of  all  behaviors  with  one  context  switch  in  which  t\ 
executes  before  t2.  Next,  consider  the  transducer  ((rf ;  rf);  (rf ;  rf)).  This  accepts  the  input- 
output  pair  ((p,  Ci,  c2),  (p,  c(,  c'2))  with  the  following  weight: 


©, 

=  ©< 


.// 

1  ’c2 


.// 

1  ’c2 


(rie;  h!)((P,  Cl,  c2),  (p,  ci,  ci))  (rf;  rf)((p,  cf,  cf),  (p,  cf,  c'2)) 

IJOPfi({(p,  ci)},  (<p, cf)})  ®  IJOPt2(«p,  c2)},  {(p,  c'f)}) 

5  IJOPtl({(p,  ci)},  {(p,  ci)})  ®  IJOPt2({(p,  ci)},  «p,  c'2)}) 

This  weight  summarizes  the  next  effect  of  all  paths  with  three  context  switches  in  which 
the  threads  execute  in  the  order:  ti,t2,ti,t2. 


6.6  Implementing  CBA 

This  chapter  developed  novel  machinery  that  shows  how  precise  CBA  can  be  carried  out 
for  various  abstractions,  including  infinite-state  abstractions.  These  algorithms  may  have 
practical  value,  as  well.  The  QR  algorithm  requires  an  explicit  fan-out  proportional  to  |G| 
for  each  context  switch,  which  can  be  very  large.  To  some  extent,  this  huge  complexity  is 
unavoidable,  as  shown  by  the  following  result. 

Theorem  6.6.1.  The  language  {(M,Ok,ci,c2)  \  M  is  a  concurrent  PDS,  ci  and  c2  are 
configurations  of  M,  and  ci(=^ec)fc+1c2}  is  NP-complete. 

Proof.  [Sketch]  The  proof  follows  from  two  earlier  pieces  of  work.  Ramalingam  [80]  showed 
that  reachability  in  multi-threaded  programs  with  synchronization  primitives  is  undecidablc 
by  giving  a  reduction  from  Post’s  correspondence  problem  (PCP)  [74],  We  also  know  that 
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bounded-PCP  is  NP-complete  [34,  Problem  SR11].  It  is  easy  to  see  that  a  program  can  use 
shared  memory  and  a  bounded  number  of  context  switches  to  simulate  a  similar  number  of 
synchronization  steps.  Thus,  Ramalingam’s  reduction  can  be  used  to  give  a  reduction  from 
bounded-PCP  to  CBMC.  This  proves  NP-hardness  of  CBMC. 

Next,  we  show  that  CBMC  is  in  NP.  First,  guess  a  7- tuple  of  PDS  control  states  (or 
global  states)  (pi,  •  •  •  ,Pk),  which  will  represent  the  history  of  global  states  at  all  the  context 
switches.  Next,  we  run  the  QR  algorithm,  but  instead  of  performing  a  fan-out  on  the  set  of 
all  reachable  global  states,  restrict  the  fan-out  at  level  i+1  of  the  computation  tree  (Fig.  6.2) 
to  just  the  states  that  contain  the  global  state  pl.  This  means  that  the  computation  tree  is 
pruned  to  one  single  branch.  The  running  time  of  QR  then  becomes  polynomial  (because  it 
only  requires  k  +  1  poststar  queries) .  If  the  set  of  reachable  states  reported  by  this  algorithm 
includes  the  target  C2,  then  output  “yes”,  otherwise  output  “no”.  It  is  easy  to  see  that  this 
non-deterministic  algorithm  solves  CBMC  in  polynomial  time.  □ 

Note  that  the  analysis  of  sequential  Boolean  programs  is  PSPACE-complete  (in  the  size  of 
the  Boolean  program;  the  above  result  is  in  terms  of  the  size  of  the  PDS),  but  tools  [85,  6,  37] 
are  able  to  handle  them  efficiently,  essentially,  by  using  BDDs  to  encode  weights  (or  binary 
relations).  The  fan-out  operation  of  the  QR  algorithm  requires  explicit  enumeration  of 
global  states,  which  destroys  the  sharing  that  existed  in  the  BDDs.  Our  algorithm,  based  on 
transducers,  requires  no  fan-out,  and  BDD-encoded  valuations  never  need  to  be  enumerated. 

We  used  matrix  domains  only  to  prove  the  existence  of  STPs.  Weights  need  not  be 
represented  using  matrices.  If  binary  relations  are  represented  using  BDDs,  then  taking 
their  tensor  product  reduces  to  concatenating  BDDs  (and  doubling  the  number  of  BDD 
variables),  which  is  a  linear-time  operation.  Composing  k  transducers  would  produce  BDDs 
with  k  times  the  variables  (a  linear  increase).  The  disadvantage  of  our  algorithm  is  that 
the  transducers  we  create  have  T  number  of  states  (where  T  is  the  set  of  program  control 
locations)  and,  consequently,  the  final  transducer  may  have  T  | k  number  of  states.  However, 
considering  the  fact  that  solving  CBA  just  requires  one  query  on  this  large  transducer,  we 
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can  use  techniques  such  as  building  it  lazily  [64]  or  exploiting  the  symmetric  structure  of 
compositions  (the  same  transducer  is  composed  each  time). 

The  next  chapter  builds  on  some  of  the  ideas  discussed  in  this  chapter  to  design  a  scalable 
algorithm  for  CBA  that  is  able  to  do  verification  of  real  concurrent  programs. 

6.7  Related  Work 

CBA  of  bounded- heap-manipulating  Boolean  programs  is  given  in  [11].  It  encodes  such 
Boolean  programs  using  PDSs,  and  then  uses  the  QR  algorithm.  The  same  encoding  could 
be  used  with  either  of  our  (unweighted  or  weighted)  transducer-based  algorithms,  instead  of 
the  QR  algorithm. 

Reachability  analysis  of  concurrent  recursive  programs  has  also  been  considered  in  [10, 
72,  19].  These  tackle  the  problem  by  computing  over- approximations  of  the  execution  paths 
of  the  program,  whereas  here  we  compute  under-approximations  (bounded  context)  of  the 
reachable  configurations.  Analysis  under  restricted  communication  policies  (in  contrast  to 
shared  memory)  has  also  been  considered  [12,  43]. 

Constructing  transducers.  As  mentioned  in  the  introduction,  a  transducer  construc¬ 
tion  for  solving  reachability  in  PDSs  was  given  earlier  by  Caucal  [17].  However,  the  construc¬ 
tion  was  given  for  prefix-rewriting  systems  in  general  and  is  not  accompanied  by  a  complex¬ 
ity  result,  except  for  the  fact  that  it  runs  in  polynomial  time.  Our  construction  for  PDSs, 
obtained  as  a  special  case  of  the  construction  given  in  Section  6.4,  is  quite  efficient.  The 
technique,  however,  seems  to  be  related.  Caucal  constructed  the  transducer  by  exploiting  the 
fact  that  the  language  of  the  transducer  is  a  union  of  the  relations  ( pre*((p ,  7 )),  posf*((p,  7))) 
for  all  p  e  P  and  7  G  T,  with  an  identity  relation  appended  onto  them  to  accept  the  un¬ 
touched  part  of  the  stack.  This  is  similar  to  our  decomposition  of  PDS  paths  (see  Fig.  6.3). 
Construction  of  a  transducer  for  WPDSs  has  not  been  considered  before.  This  was  crucial 
for  developing  an  algorithm  for  general  CBA. 

The  pop-function  used  in  Section  6.4  represents  summary  information  about  paths,  and 
is  similar  to  the  use  of  composed  transformer  functions  from  [25],  summary  functions  from 
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[88],  summary  edges  from  [81],  and  summary  micro-functions  from  [84],  In  all  of  these  cases, 
information  is  tabulated  that  summarizes  the  net  effect  of  following  all  possible  paths  from 
certain  kinds  of  sources  to  certain  kinds  of  targets.  The  path  information  is  pre-computed 
and  added  to  a  structure  that  is  used  for  answering  queries. 

One  difference  between  our  work  and  the  aforementioned  work  is  that  in  all  of  the  latter 
the  paths  summarized  are  same-level  valid  paths  (paths  in  which  pushes  and  pops  match 
as  in  a  language  of  balanced  parentheses),  whereas  the  pop-function  summarizes  paths  that 
result  in  the  net  loss  of  a  stack  symbol.  In  this  respect,  the  pop-function  is  more  like  the 
“unbalanced-by-1”  summarization  information  used  in  the  simulation  technique  for  testing 
membership  of  a  string  in  the  language  accepted  by  a  2NDPDA  (2-way  non-deterministic 
PDA)  [1],  Note  that  the  “unbalanced-by-1”  nature  of  the  pop- function  is  what  makes  it  useful 
in  an  automaton  construction  (i.e.,  the  popped  symbol  corresponds  to  a  letter  consumed  by 
the  automaton). 

Composing  transducers.  There  is  a  large  body  of  work  on  weighted  automata  and 
weighted  transducers  in  the  speech- recognition  community  [64,  65] .  However,  the  weights  in 
their  applications  usually  satisfy  many  more  properties  than  those  of  a  semiring,  including 
the  existence  of  an  inverse  and  commutativity  of  extend.  We  refrain  from  making  such 
assumptions. 

Tensor  products  have  been  used  previously  in  program  analysis  for  combining  abstractions 
[71].  We  use  them  in  a  different  context  and  for  a  different  purpose.  In  particular,  previous 
work  has  used  them  for  combining  abstractions  that  are  performed  in  lock- step]  in  contrast, 
we  use  them  to  stitch  together  the  data  state  before  a  context  switch  with  the  data  state 
after  a  context  switch. 
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Chapter  7 

Reducing  Concurrent  Analysis  Under  a  Context  Bound 
to  Sequential  Analysis 

This  chapter  presents  a  second  result  towards  our  goal  of  automatically  extending  anal¬ 
yses  for  sequential  programs  to  analyses  for  concurrent  programs  under  a  bound  on  the 
number  of  context  switches.  Chapter  6  showed  that  the  existence  of  a  tensor-product  oper¬ 
ation  automatically  enabled  precise  CBA  of  various  program  models  (namely,  all  those  that 
could  be  encoded  using  a  weighted  pushdown  system).  In  this  chapter,  we  present  a  more 
direct  way  of  obtaining  algorithms  for  CBA  that  does  not  require  tensor  products. 

Let  us  recall  the  existing  results  on  CBA.  The  decidability  of  CBA,  when  each  program 
thread  is  abstracted  as  a  pushdown  system  (PDS)  was  shown  in  [77].  This  result  was  extended 
to  PDSs  with  bounded  heaps  in  [11],  Our  work,  which  was  described  in  Chapter  6,  extended 
the  result  to  weighted  PDSs  (WPDSs).  All  of  this  work  required  devising  new  algorithms. 
Moreover,  each  of  the  algorithms  have  certain  disadvantages  towards  realizing  a  practical 
implementation. 

In  the  sequential  setting,  model  checkers,  such  as  those  described  in  [6,  85,  37],  use 
symbolic  techniques  in  the  form  of  binary  decision  diagrams  (BDDs)  for  scalability.  With 
the  CBA  algorithms  of  [77,  11],  it  is  not  clear  if  symbolic  techniques  can  be  applied.  Those 
algorithms  require  the  enumeration  of  all  reachable  states  of  the  shared  memory  at  a  context 
switch.  This  can  potentially  be  very  expensive.  However,  those  algorithms  have  the  nice 
property  that  they  only  consider  those  states  that  actually  arise  during  valid  (abstract) 
executions  of  the  model.  (We  call  this  lazy  exploration  of  the  state  space.) 
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The  results  presented  in  Chapter  6  extend  the  algorithm  of  [77]  to  use  symbolic  tech¬ 
niques.  However,  the  disadvantage  there  is  that  it  requires  computing  auxiliary  information, 
in  the  form  of  a  transducer,  for  exploring  the  reachable  state  space.  (We  call  this  eager 
exploration  of  the  state  space.)  The  transducer  summarizes  the  effect  of  executing  a  thread 
from  any  control  location  to  any  other  control  location,  and  hence  may  consider  many  more 
program  behaviors  than  can  actually  occur  in  a  valid  execution  of  the  program  (whence  the 
term  “eager”). 

This  contrast  between  lazy  and  eager  approaches  can  also  be  illustrated  by  considering 
interprocedural  analysis  of  sequential  programs:  for  a  procedure,  it  is  possible  to  construct 
a  summary  for  the  procedure  that  describes  the  effect  of  executing  it  for  any  possible  inputs 
to  the  procedure  (eager  computation  of  the  summary).  It  is  also  possible  to  construct  the 
summary  lazily  (also  called  partial  transfer  functions  [69])  by  only  describing  the  effect 
of  executing  the  procedure  for  input  states  under  which  it  is  called  during  the  analysis 
of  the  program.  The  former  (eager)  approach  has  been  successfully  applied  to  Boolean 
programs  [6] ,  but  the  latter  (lazy)  approach  is  often  desirable  in  the  presence  of  more  complex 
abstractions,  especially  those  that  contain  pointers  (based  on  the  intuition  that  only  a  few 
aliasing  scenarios  occur  during  abstract  execution).  Interprocedural  analysis  frameworks,  like 
the  Sharir  and  Pnucli  tabulation  algorithm  [88],  the  Reps-Horwitz-Sagiv  graph  reachability 
approach  [81],  and  others  [84]  are  also  lazy.  The  option  of  switching  between  eager  and  lazy 
exploration  exists  in  some  model  checkers  [6,  50]. 

Contributions 

The  work  presented  in  this  chapter  makes  three  main  contributions.  First,  we  show  how 
to  reduce  a  concurrent  program  to  a  sequential  one  that  simulates  all  its  executions  for  a 
given  number  of  context  switches.  This  has  the  following  advantages: 


181 


•  It  allows  one  to  obtain  algorithms  for  CBA  using  different  program  abstractions.  We 
specialize  the  reduction  to  Boolean  programs  (Section  7.2),  PDSs  (Section  7.3),  sym¬ 
bolic  PDSs  (Section  7.4),  and  WPDSs  (Section  7.5).  The  reduction  for  Boolean  pro¬ 
grams  shows  that  the  use  of  PDS-based  technology,  which  seemed  crucial  in  previous 
work,  is  not  necessary:  standard  interprocedural  algorithms  [81,  88,  52]  can  also  be 
used  for  CBA.  Moreover,  it  allows  one  to  carry  over  symbolic  techniques  designed  for 
sequential  programs  to  CBA. 

•  Our  reduction  provides  a  way  to  harness  existing  abstraction  techniques  to  obtain 
new  algorithms  for  CBA.  The  reduction  introduces  symbolic  constants  and  assume 
statements.  Thus,  any  sequential  analysis  that  can  deal  with  these  two  features  can 
be  extended  to  handle  concurrent  programs  as  well  (under  a  context  bound). 

Symbolic  constants  are  only  associated  with  the  shared  data  in  the  program.  When 
only  a  finite  amount  of  data  is  shared  between  the  threads  of  a  program  (e.g.,  there  are 
only  a  finite  number  of  locks),  any  sequential  analysis,  even  of  programs  with  pointers 
or  integers,  can  be  extended  to  perform  CBA  of  concurrent  programs.  When  the  shared 
data  is  not  finite,  our  reduction  still  applies;  for  instance,  numeric  analyses,  such  as 
polyhedral  analysis  [27],  can  be  applied  to  CBA  of  concurrent  programs. 

•  For  the  case  in  which  a  PDS  is  used  to  model  each  thread,  we  obtain  better  asymp¬ 
totic  complexity  than  previous  algorithms,  just  by  using  the  standard  PDS  algorithms 
(Section  7.3). 

•  The  reduction  shows  how  to  obtain  algorithms  that  scale  linearly  with  the  number  of 
threads  (whereas  previous  algorithms  scaled  exponentially). 

Second,  we  show  how  to  obtain  a  lazy  symbolic  algorithm  for  CBA  on  Boolean  programs 
(Section  7.6).  This  combines  the  best  of  previous  algorithms:  the  algorithms  of  [77,  11]  are 
lazy  but  not  symbolic,  and  the  algorithm  presented  in  Chapter  6  is  symbolic  but  not  lazy. 


Third,  we  implemented  both  eager  and  lazy  algorithms  for  CBA  on  Boolean  programs. 
We  report  the  scalability  of  these  algorithms  on  programs  obtained  from  various  sources  and 
also  show  that  most  bugs  can  be  found  in  a  few  context  switches  (Section  7.7). 

The  rest  of  this  chapter  is  organized  as  follows:  Section  7.1  gives  a  general  reduction 
from  concurrent  to  sequential  programs;  Section  7.2  specializes  the  reduction  to  Boolean 
programs;  Section  7.3  specializes  the  reduction  to  PDSs;  Section  7.4  specializes  the  reduction 
to  symbolic  PDSs;  Section  7.5  specializes  the  reduction  to  WPDSs;  Section  7.6  gives  a 
lazy  symbolic  algorithm  for  CBA  on  Boolean  programs;  Section  7.7  reports  experiments 
performed  using  both  eager  and  lazy  versions  of  the  algorithms  presented  in  this  chapter; 
Section  7.8  discusses  related  work.  Proofs  can  be  found  in  Section  7.9. 

7.1  A  General  Reduction 

This  section  gives  a  general  reduction  from  concurrent  programs  to  sequential  programs 
under  a  given  context  bound.  This  reduction  transforms  the  non- determinism  in  control, 
which  arises  because  of  concurrency,  to  non- determinism  on  data.  (The  motivation  is  that 
the  latter  problem  is  understood  much  better  than  the  former  one.) 

The  execution  of  a  concurrent  program  proceeds  in  a  sequence  of  execution  contexts , 
defined  as  the  time  between  consecutive  context  switches  during  which  only  a  single  thread 
has  control.  We  do  not  consider  dynamic  creation  of  threads,  and  assume  that  a  concurrent 
program  is  given  as  a  fixed  set  of  threads,  with  one  thread  identified  as  the  starting  thread. 

Suppose  that  a  program  has  two  threads,  Ti  and  T2,  and  that  the  context-switch  bound 
is  2K  —  1.  Then  any  execution  of  the  program  under  this  bound  will  have  up  to  2K  ex¬ 
ecution  contexts,  with  control  alternating  between  the  two  threads,  informally  written  as 
T\ ;  T2;  Ti  ;  •  •  • .  Each  thread  has  control  for  at  most  K  execution  contexts.  Consider  three 
consecutive  execution  contexts  T\ ;  T2;  T\ .  When  Ti  finishes  executing  the  first  of  these,  it 
gets  swapped  out  and  its  local  state,  say  /,  is  stored.  Then  T2  gets  to  run,  and  when  it  is 
swapped  out,  Tx  has  to  resume  execution  from  l  (along  with  the  global  store  produced  by 
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The  requirement  of  resuming  from  the  same  local  state  is  one  difficulty  that  makes  anal¬ 
ysis  of  concurrent  programs  hard — during  the  analysis  of  T2,  the  local  state  of  Tf  has  to  be 
remembered  (even  though  it  is  unchanging).  This  forces  one  to  consider  the  cross  product 
of  the  local  states  of  the  threads,  which  causes  exponential  blowup  when  the  local  state 
space  is  finite,  and  undecidability  when  the  local  state  includes  a  stack.  An  advantage  of 
introducing  a  context  bound  is  the  reduced  complexity  with  respect  to  the  size  \L\  of  the 
local  state  space:  the  algorithms  of  [77,  11]  scale  as  (9(|A|5);  and  the  one  from  Chapter  6 
scales  as  0{\L\:').  Our  algorithm,  for  PDSs  (Section  7.3),  is  0(\L\).  (Strictly  speaking,  in 
each  of  these,  |L|  is  the  size  of  the  local  transition  system.) 

The  key  observation  is  the  following:  for  analyzing  T, ;  T2;  T, ,  we  modify  the  threads  so 
that  we  only  have  to  analyze  T, ;  T, ;  T2,  which  eliminates  the  requirement  of  having  to  drag 
along  the  local  state  of  Ti  during  the  analysis  of  T2.  For  this,  we  assume  the  effect  that 
T-2  might  have  on  the  shared  memory,  apply  it  while  T,  is  executing,  and  then  check  our 
assumption  after  analyzing  T2. 

Consider  the  general  case  when  each  of  the  two  threads  have  K  execution  contexts.  We 
refer  to  the  state  of  shared  memory  as  the  global  state.  First,  we  guess  A'— 1  (arbitrary)  global 
states,  say  s,,s2,  •  •  •  ,%_i.  We  run  Tf  so  that  it  starts  executing  from  the  initial  state  s0 
of  the  shared  memory.  At  a  non-deterministically  chosen  time,  we  record  the  current  global 
state  s] ,  change  it  to  si,  and  resume  execution  of  Tf.  Again,  at  a  non-deterministically 
chosen  time,  we  record  the  current  global  state  s'2 ,  change  it  to  .s2,  and  resume  execution  of 
T\ .  This  continues  A'  — 1  times.  Implicitly,  this  implies  that  we  assumed  that  the  execution  of 
T2  will  change  the  global  state  from  s'  to  Si  in  its  ith  execution  context.  Next,  we  repeat  this 
for  T2:  we  start  executing  T2  from  .s', .  At  a  non-deterministically  chosen  time,  we  record  the 
global  state  s'/,  we  change  it  to  s'2  and  repeat  A'  —  1  times.  Finally,  we  verify  our  assumption: 
we  check  that  s'/  =  Sj+i  for  all  i  between  1  and  K  —  1.  If  these  checks  pass,  we  have  the 
guarantee  that  T2  can  reach  state  s  if  and  only  if  the  concurrent  program  can  have  the  global 
state  s  after  K  execution  contexts  per  thread. 
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The  fact  that  we  do  not  alternate  between  Tf  and  T2  implies  the  linear  scalability  with 
respect  to  \L\.  Because  the  above  process  has  to  be  repeated  for  all  valid  guesses,  our 
approach  scales  as  (9(|G|A),  where  G  is  the  global  state  space.  In  general,  the  exponential 
complexity  with  respect  to  K  may  not  be  avoidable  because  the  problem  is  NP-complete 
when  the  input  has  K  written  in  unary  (Thm.  6.6.1).  However,  symbolic  techniques  can  be 
used  for  a  practical  implementation. 

We  show  how  to  reduce  the  above  assume-guarantee  process  into  one  of  analyzing  a 
sequential  program.  We  add  more  variables  to  the  program,  initialized  with  symbolic  con¬ 
stants,  to  represent  our  guesses.  The  switch  from  one  global  state  to  another  is  made  by 
switching  the  set  of  variables  being  accessed  by  the  program.  We  verify  the  guesses  by 
inserting  assume  statements  at  the  end. 

7.1.1  The  reduction 

Consider  a  concurrent  program  P  with  two  threads  Tf  and  T2  that  only  has  scalar  vari¬ 
ables  (i.e.,  no  pointers,  arrays,  or  heap).1  We  assume  that  the  threads  share  their  global 
variables,  i.e.,  they  have  the  same  set  of  global  variables.  Let  VaRg  be  the  set  of  global 
variables  of  P.  Let  2 K  —  1  be  the  bound  on  the  number  of  context  switches. 

The  result  of  our  reduction  is  a  sequential  program  Ps.  It  has  three  parts,  performed 
in  sequence:  the  first  part  T{  is  a  reduction  of  Ti;  the  second  part  T|  is  a  reduction  of  T2l 
and  the  third  part,  Checker,  consists  of  multiple  assume  statements  to  verify  that  a  correct 
interleaving  was  performed.  Let  L,  be  the  label  preceding  the  ith  part.  Ps  has  the  form 
shown  in  the  first  column  of  Fig.  7.1. 

The  global  variables  of  Ps  are  K  copies  of  VaRg.  If  VaRg  =  {xi,---  , xn},  then  let 
VaRg  =  {xl,  •  •  •  ,  x^}.  The  initial  values  of  VaRq  are  a  set  of  symbolic  constants  that 
represent  the  ith  guess  st.  Ps  has  an  additional  global  variable  k,  which  will  take  values 
between  1  and  K  +  1.  It  tracks  the  current  execution  context  of  a  thread:  at  any  time  Ps 
1  Such  models  are  often  used  in  model  checking  and  numeric  program  analysis. 
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Program  Ps 

st  G  Tt 

Checker 

Li  :  Tfi 

L  -2  ■  Ti; 

L3  :  Checker; 

if  k  =  1  then 

r(st,l); 

else  if  k  =  2  then 

r(st,2); 

else  if  k  =  K  then 

r(st,/l); 

end  if 

if  k  <  K  and  *  then 

k  ++; 

end  if 

if  k  =  K  +  1  then 

k  =  1; 

goto  Ll+X 

end  if 

for  i  =  1  to  K  —  1  do 

for  j  —  1  to  n  do 

assume  (x®  =  v *+1); 

end  for 

end  for 

Figure  7.1  The  reduction  for  general  concurrent  programs  under  a  context  bound  2 K  —  1. 
In  the  second  column,  *  stands  for  a  nondeterministic  Boolean  value. 

can  only  read  and  write  to  variables  in  Var|V  The  local  variables  of  T?  are  the  same  as 
those  of  Tj. 

Let  r(x,  i)  =  x*.  If  st  is  a  program  statement  in  P,  let  r(st,  i)  be  the  statement  in  which 
each  global  variable  x  is  replaced  with  r(x,  i),  and  the  local  variables  remain  unchanged. 
The  reduction  constructs  Tf  from  T)  by  replacing  each  statement  st  by  what  is  shown  in  the 
second  column  of  Fig.  7.1.  The  third  column  shows  Checker.  Variables  Var^  are  initialized 
to  the  same  values  as  Var^  in  P.  Variable  x® ,  when  i  ^  1,  is  initialized  to  the  symbolic 
constant  u®  (which  is  later  referenced  inside  Checker),  and  k  is  initialized  to  1. 
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Because  local  variables  are  not  replicated,  a  thread  resumes  execution  from  the  same 
local  state  it  was  in  when  it  was  swapped  out  at  a  context  switch. 

The  Checker  enforces  a  correct  interleaving  of  the  threads.  It  checks  that  the  values 
of  global  variables  when  Tl  starts  its  i  +  1st  execution  context  are  the  same  as  the  values 
produced  by  T2  when  T2  finished  executing  its  ith  execution  context.  (Because  the  execution 
of  T|  happens  after  Tf ,  each  execution  context  of  T|  is  guaranteed  to  use  the  global  state 
produced  by  the  corresponding  execution  context  of  Tf.) 

The  reduction  ensures  the  following  property:  when  Ps  finishes  execution,  the  variables 
Vara.  can  have  a  valuation  s  if  and  only  if  the  variables  VaRg  in  P  can  have  the  same 
valuation  after  2 K  —  1  context  switches. 

Symbolic  constants 

One  way  to  deal  with  symbolic  constants  is  to  consider  all  possible  values  for  them  (eager 
computation).  We  show  instances  of  this  strategy  for  Boolean  programs  (Section  7.2)  and 
for  PDSs  (Section  7.3).  Another  way  is  to  lazily  consider  the  set  of  values  they  may  actually 
take  during  the  (abstract)  execution  of  the  concurrent  program,  i.e.,  only  consider  those 
values  that  pass  the  Checker.  We  show  an  instance  of  this  strategy  for  Boolean  programs 
(Section  7.6). 

7.1.2  Multiple  threads 

If  there  are  n  threads,  n  >  2,  then  a  precise  reasoning  for  K  context  switches  would 
require  one  to  consider  all  possible  thread  schedulings,  e.g.,  (Ti;  T2;  Ti;  T3),  (Ti;  T3;  T2;  T3), 
etc.  There  are  0((n  —  1)A)  such  schedulings.  Previous  analyses  [77,  11]  enumerate  explicitly 
all  these  schedulings,  and  thus  have  0((n  —  1)A)  complexity  even  in  the  best  case.  We 
avoid  this  exponential  factor  as  follows:  we  only  consider  the  round-robin  thread  schedule 
Ti;  T2;  •  •  •  Tn;  Ti;  T2;  •  •  •  for  CBA,  and  bound  the  length  of  this  schedule  instead  of  bounding 
the  number  of  context  switches.  Because  a  thread  is  allowed  to  perform  no  steps  during 
its  execution  context,  CBA  still  considers  other  schedules.  For  example,  when  n  =  3,  the 
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schedule  Ti;T2;Ti;T3  will  be  considered  while  analyzing  a  round-robin  schedule  of  length 
6  (in  the  round-robin  schedule,  T3  does  nothing  in  its  first  execution  context,  and  T-2  does 
nothing  in  its  second  execution  context). 

Setting  the  bound  on  the  length  of  the  round-robin  schedule  to  nK  allows  CBA  to 
consider  all  thread  schedulings  with  K  context  switches  (as  well  as  some  schedulings  with 
more  than  K  context  switches).  Under  such  a  bound,  a  schedule  has  K  execution  contexts 
per  thread. 

The  reduction  for  multiple  threads  proceeds  in  a  similar  way  to  the  reduction  for  two 
threads.  The  global  variables  are  copied  K  times.  Each  thread  T)  is  transformed  to  Tf,  as 
shown  in  Fig.  7.1,  and  Ps  calls  the  Tf  in  sequence,  followed  by  Checker.  Checker  remains  the 
same  (it  only  has  to  check  that  the  state  after  the  execution  of  Tf  agrees  with  the  symbolic 
constants). 

The  advantages  of  this  approach  are  as  follows:  ( i )  we  avoid  an  explicit  enumeration 
of  0{(n  —  T)K)  thread  schedules,  thus,  allowing  our  analysis  to  be  more  efficient  in  the 
common  case;  (ii)  we  explore  more  of  the  program  behavior  with  a  round-robin  bound  of 
nK  than  with  a  context-switch  bound  of  K ;  and  (Hi)  the  cost  of  analyzing  the  round-robin 
schedule  of  length  nK  is  about  the  same  (in  fact,  better)  than  what  previous  analyses  take 
for  exploring  one  schedule  with  a  context  bound  of  K  (see  Section  7.3).  These  advantages 
allow  our  analysis  to  scale  much  better  in  the  presence  of  multiple  threads  than  previous 
analyses.  Our  implementation  tends  to  scale  linearly  with  respect  to  the  number  of  threads 
(Section  7.7). 

In  the  rest  of  this  chapter,  we  only  consider  two  threads  because  the  extension  to  multiple 
threads  is  straightforward  for  round-robin  scheduling. 
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7.1.3  Ability  of  the  reduction  to  harness  different  analyses  for 
CBA 

The  reduction  introduces  assume  statements  and  symbolic  constants.  Any  sequential 
analysis  that  can  deal  with  these  two  features  can  be  extended  to  handle  concurrent  programs 
as  well  (under  a  context  bound). 

Any  abstraction  prepared  to  interpret  program  conditions  can  also  handle  assume  state¬ 
ments.  Certain  analysis,  such  as  affine-relation  analysis  (ARA)  over  integers  cannot  make  use 
of  the  reduction:  the  presence  of  assume  statements  makes  the  ARA  problem  undecidable 
[67].  The  reduction  presented  in  Section  7.5  avoids  introducing  assume  statements. 

It  is  harder  to  make  a  general  claim  about  whether  most  sequential  analyses  can  handle 
symbolic  constants.  A  variable  initialized  with  a  symbolic  constant  can  be  treated  safely  as  an 
uninitialized  variable;  thus,  any  analysis  that  considers  all  possible  values  for  an  uninitialized 
variable  can,  in  some  sense,  accommodate  symbolic  constants. 

Another  place  where  symbolic  constants  are  used  in  sequential  analyses  is  to  construct 
summaries  for  recursive  procedures.  Eager  computation  of  a  procedure  summary  is  similar  to 
analyzing  the  procedure  while  assuming  symbolic  values  for  the  parameters  of  the  procedure. 

It  is  easy  to  see  that  our  reduction  applies  to  concurrent  programs  that  only  share  finite- 
state  data.  In  this  case,  the  symbolic  constants  can  only  take  on  a  finite  number  of  values. 
Thus,  any  sequential  analysis  can  be  extended  for  CBA  merely  by  enumerating  all  their 
values  (or  considering  them  lazily  using  techniques  similar  to  the  ones  presented  in  Section 
7.6).  This  implies  that  sequential  analyses  of  programs  with  pointers,  arrays,  and/or  integers 
can  be  extended  to  perform  CBA  of  such  programs  when  only  finite-state  data  (e.g.,  a  finite 
number  of  locks)  is  shared  between  the  threads. 

The  reduction  also  applies  when  the  shared  data  is  not  finite-state,  although  in  this  case 
the  values  of  symbolic  constants  cannot  be  enumerated.  For  instance,  the  reduction  can  take 
a  concurrent  numeric  program  (defined  as  one  having  multiple  threads,  each  manipulating 
some  number  of  potentially  unbounded  integers),  and  produce  a  sequential  numeric  program. 


189 


Then  most  numeric  analyses,  such  as  polyhedral  analysis  [27],  can  be  applied  to  the  program. 
Such  analyses  are  typically  able  to  handle  symbolic  constants. 

7.2  The  Reduction  for  Boolean  Programs 

For  ease  of  exposition,  we  assume  that  all  procedures  of  a  Boolean  program  have  the 
same  number  of  local  variables.  Furthermore,  the  global  variables  can  have  any  value  when 
program  execution  starts,  and  similarly  for  the  local  variables  when  a  procedure  is  invoked. 
Let  G  be  the  set  of  valuations  of  the  global  variables,  and  L  be  the  set  of  valuations  of  the 
local  variables.  A  program  data-state  is  an  element  of  G  x  L.  Each  program  statement  st 
of  the  Boolean  program  can  be  associated  with  a  relation  [st]  C  (G  x  L)  x  (G  x  L)  such 
that  (go,lo,  gi,h)  £  [st]  when  the  execution  of  st  on  the  state  (go  Jo)  can  lead  to  the  state 

(gi,h)- 

7.2.1  Analysis  of  sequential  Boolean  programs 

In  this  section,  we  recall  analyses  for  sequential  Boolean  programs.  The  goal  of  analyzing 
Boolean  programs  is  to  compute  the  set  of  data-states  that  can  reach  a  program  node.  This 
is  done  using  the  rules  shown  in  Fig.  7.2  [6].  These  rules  follow  standard  interprocedural 
analyses  [81,  88].  Let  entry(f)  denote  the  entry  node  of  procedure  f,  proc(n )  denote  the 
procedure  that  contains  node  n,  ep(ra)  denote  entry (proc(n))]  let  exitnode(n)  denote  a  pred¬ 
icate  on  nodes  that  is  true  when  n  is  the  exit  node  of  its  procedure.  Let  Pr  be  the  set  of 
procedures  of  the  program,  which  includes  a  distinguished  procedure  main.  The  rules  of 
Fig.  7.2  compute  three  types  of  relations:  Hn(g0Jo,  gijj  denotes  the  fact  that  if  (go  Jo)  is 
the  data  state  at  entry(n),  then  the  data  state  (g\J\)  can  reach  node  n;  Sf  is  the  summary 
relation  for  procedure  f,  which  captures  the  net  transformation  that  an  invocation  of  the 
procedure  can  have  on  the  global  state;  Rn  is  the  set  of  data  states  that  can  reach  node  n. 
All  relations  are  initialized  to  be  empty. 
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First  phase 
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Second  phase 


9  S  G,l  S  L 
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Gentry  ('mam  j  ( 9-  0 

Rep(n)(doJo)  Hn(g0,lo,gi,h) 
Rn  (91;  h) 


^5 


Rn(goJo)  n  caI1  —  >  m  l  e  L 

-^entry(f)  (90i  0 


7?-6 


Hn(go,lo,gi,h)  n  f0  >  m  l26L 

_ #entry(f)(gl^2,9l'k) _ 


^7 


Hn(go,lo,9i,h) 

Rn(guh) 


TZs 


Figure  7.2  Rules  for  the  analysis  of  Boolean  programs. 

Eager  analysis.  Rules  77()  to  1Z§  describe  an  eager  analysis.  The  analysis  proceeds  in  two 
phases.  In  the  first  phase,  the  rules  TZq  to  77-3  are  used  to  saturate  the  relations  H  and  S. 
In  the  next  phase,  this  information  is  used  to  build  the  relation  R  using  rules  1Z^  to  TZq. 
Lazy  analysis.  Let  rule  7 Z'0  be  the  same  as  TZq  but  restricted  to  just  the  main  procedure. 
Then  the  rules  TZ'0,TZi,TZ2,TZs,TZt,TZs  describe  a  lazy  analysis.  The  rule  TZr  restricts  the 
analysis  of  a  procedure  to  only  those  states  it  is  called  in.  As  a  result,  the  second  phase  gets 
simplified  and  consists  of  only  the  rule  1Z8. 

Practical  implementations  [6,  50]  use  BDDs  to  encode  each  of  the  relations  H,  S,  and  R 
and  the  rule  applications  are  changed  into  BDD  operations.  For  example,  rule  1Z\  is  simply 
the  relational  composition  of  relations  Hn  and  [st],  which  can  be  implemented  efficiently 
using  BDDs. 

7.2.2  Context-bounded  analysis  of  concurrent  Boolean  programs 

Concurrent  Boolean  programs  were  defined  in  Section  6.1.  We  can  apply  the  reduction 
presented  in  Section  7.1  on  a  concurrent  Boolean  program  to  obtain  a  sequential  Boolean 
program  by  making  the  following  changes  to  the  reduction:  ( i )  the  variable  k  is  modeled 


191 


using  a  vector  of  log(/l )  Boolean  variables,  and  the  increment  operation  is  implemented  using 
a  simple  Boolean  circuit  on  these  variables;  (ii)  the  if  conditions  are  modeled  using  assume 
statements;  and  (Hi)  the  symbolic  constants  are  modeled  using  additional  (uninitialized) 
global  variables  that  are  not  modified  in  the  program.  Running  any  sequential  analysis 
algorithm,  and  projecting  out  the  values  of  the  Kth  set  of  global  variables  from  Rn  gives  the 
precise  set  of  reachable  global  states  at  node  n  in  the  concurrent  program. 

The  worst-case  complexity  of  analyzing  a  Boolean  program  P  is  bounded  by 
C>(|P||G|3|L|2),  where  \P\  is  the  number  of  program  statements.  Thus,  using  our  approach,  a 
concurrent  Boolean  program  Pc  with  m  threads,  and  K  execution  contexts  per  thread  (with 
round-robin  scheduling),  can  be  analyzed  in  time  0(K\Pc\(K\G\k)3\L\2\G\k):  the  size  of  the 
sequential  program  obtained  from  Pc  is  K Pc  | ;  it  has  the  same  number  of  local  variables,  and 
its  global  variables  have  K\G \K  number  of  valuations.  Additionally,  the  symbolic  constants 
can  take  \G\K  number  of  valuations,  adding  an  extra  multiplicative  factor  of  |GjA.  The 
analysis  scales  linearly  with  the  number  of  threads  (|PC|  is  0(m)). 

This  reduction  actually  applies  to  any  model  that  works  with  finite-state  data,  which 
includes  Boolean  programs  with  references  [8,  75].  In  such  models,  the  heap  is  assumed  to 
be  bounded  in  size.  The  heap  is  included  in  the  global  state  of  the  program,  hence,  our 
reduction  would  create  multiple  copies  of  the  heap,  initialized  with  symbolic  values.  Our 
experiments  (Section  7.7)  used  such  models. 

Such  a  process  of  duplicating  the  heap  can  be  expensive  when  the  number  of  heap  con¬ 
figurations  that  actually  arise  in  the  concurrent  program  is  very  small  compared  to  the  total 
number  of  heap  configurations  possible.  The  lazy  version  of  our  algorithm  (Section  7.6) 
addresses  this  issue. 

7.3  The  Reduction  for  PDSs 

The  motivation  for  presenting  the  reduction  for  PDSs  is  that  it  allows  one  to  apply  the 
numerous  algorithms  developed  for  PDSs  to  concurrent  programs  under  a  context  bound. 
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For  each  (p,  7}  ^  ( p u)  G  (Ax  U  A2)  and  for  all  pi  G  P,  k  G  {1,  •  •  •  ,  A'}: 
{(k,pi,---  ,Pk-i,P,Pk+i,--  ,Pk),i)  ^  {(k,Pi,---  ,pk-i,p',Pk+ 1,-"  ,PaO,u> 


For  each  7  G  Tj  and  for  all  pi  G  P,  k  G  {1,  •  •  •  ,  A'}: 
((k.,pi,  ■  ■  ■  %Pk)i1)  ^  ((*  +  ,Pk),i) 

((K  +  l,pi,  —  ,pK),l)  ^  ((l,Pi,-*-  ,PA-),ej+i  7) 


Figure  7.3  PDS  rules  for  Ps. 

For  instance,  one  can  use  backward  analysis  of  PDSs  to  get  a  backward  analysis  on  the 
concurrent  program,  or  even  compute  error  projections  (Chapter  5)  of  concurrent  programs. 

Concurrent  PDSs  were  defined  in  Section  6.1.  In  this  section,  we  only  consider  concurrent 
PDSs  with  two  threads.  Let  the  concurrent  PDS  be  (Pi,  P2),  where  Vi  =  (P,  r*,  Aj).  Let 
be  the  transition  system  of  P,  and  let  =>1  be  its  extension  to  conhgurations  of  the  concurrent 
PDS  (also  defined  in  Section  6.1).  Let  the  context-switch  bound  be  2K  —  1,  so  that  each 
thread  gets  K  chances  to  execute. 

We  will  reduce  (Pi,  P2)  to  a  single  PDS  Vs  =  (Ps,  Ts,  As).  Let  Ps  be  the  set  of  all  K  +  1 
tuples  whose  first  component  is  a  number  between  1  and  K,  and  the  rest  are  from  the  set 
P,  i.e.,  Ps  =  {1,  -  -  -  ,  AT}  x  P  X  p  x  •  •  •  x  P.  This  set  relates  to  the  reduction  from  Section 
7.1  as  follows:  an  element  (k,pi,  ■  ■  ■  ,px)  G  Ps  represents  that  the  value  of  the  variable  k  is 
k ;  and  pt  encodes  a  valuation  of  the  variables  Var^.  When  Ps  is  in  such  a  state,  its  rules 
only  modify  77.. 

Let  e.j  G  Tj  be  the  starting  node  of  the  ith  thread.  Let  be  the  disjoint  union  of  r1?  T2 
and  an  additional  symbol  {€3} .  Vs  does  not  have  an  explicit  checking  phase.  The  rules  As 
are  defined  in  Fig.  7.3. 

We  deviate  slightly  from  the  reduction  presented  in  Section  7.1  by  changing  the  goto 
statement,  which  passes  control  from  the  first  thread  to  the  second,  into  a  procedure  call. 
This  ensures  that  the  stack  of  the  first  thread  is  left  intact  when  control  is  passed  to  the 
next  thread.  Furthermore,  we  assume  that  the  PDSs  cannot  empty  their  stacks,  i.e.,  it  is 
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not  possible  that  (p,  ei)  =>*Pi  ( p e)  or  ( p ,  e 2)  (p',  e)  for  all  p,  p'  e  P  (in  other  words,  the 

main  procedure  should  not  return).  This  can  be  enforced  by  introducing  new  symbols  e( ,  e" 
in  Vi  such  that  e'  calls  ep  pushing  e"  on  the  stack,  and  ensuring  that  no  rule  can  fire  on  e 

Theorem  7.3.1.  Starting  execution  of  the  concurrent  PDS  (V i,V2)  from  the  state 
(p,e i,e2)  can  lead  to  the  state  (p',ci,c2)  under  the  transition  system  ((=>1)*;  (=^2)*)^ 
if  and  only  if  there  exist  states  p2,  •  ■  •  ,Pk  £  P  such  that  ((l,p,p2,  •••  ,  PA')>ei) 
((1,P2,P3W  ,PK,p'),e3  C2  Cl). 

Note  that  the  checking  phase  is  implicit  in  the  statement  of  Thm.  7.3.1.  (One  can  also 
make  the  PDS  Vs  have  an  explicit  checking  phase,  starting  at  node  63.)  A  proof  is  given  in 
Section  7.9. 

Complexity.  Using  our  reduction,  one  can  find  the  set  of  all  reachable  configurations  of  the 
concurrent  PDS  (V\ ,  V2)  in  time  0(K2\P\2R  |Proc||Ai  +  A2I),  where  |Proc|  is  the  number  of 
procedures  in  the  concurrent  PDS2  (see  Section  7.9).  Using  backward  reachability  algorithms, 
one  can  verify  if  a  given  configuration  in  reachable  in  time  0(K3\P\2h\Ai  +  A2|).  Both 
these  complexities  are  asymptotically  better  than  those  of  previous  algorithms  for  PDSs 
[77],  including  the  one  presented  in  Chapter  6.  Note  that  the  complexity  for  backward 
reachability  is  linear  in  the  program  size  |Ai  +  A2|. 

A  similar  reduction  works  for  multiple  threads  as  well  (under  round- robin  scheduling). 
Moreover,  the  complexity  of  finding  all  reachable  states  under  a  bound  of  nK  with  n  threads, 
using  a  standard  PDS  reachability  algorithm,  is  0(K3\P\4h  \  Proc\\A\) ,  where  |A|  =  E”=1|Aj| 
is  the  total  number  of  rules  in  the  concurrent  PDS. 

This  reduction  produces  a  large  number  of  rules  (0(\P\ K  \  A|))  in  the  resultant  PDS,  but 
we  can  leverage  work  on  symbolic  PDSs  (SPDSs)  [85]  to  obtain  a  symbolic  implementation. 

2The  number  of  procedures  of  a  PDS  is  defined  as  the  number  of  symbols  appearing  as  the  first  of  the 
two  stack  symbols  on  the  right-hand  side  of  a  call  rule. 
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7.4  The  Reduction  for  Symbolic  PDSs 

A  symbolic  pushdown  system  (SPDS)  is  a  triple  (V,G,val),  where  V  =  ({p},  T,  A.)  is 
a  single-state  PDS,  G  is  a  finite  set,  and  val  :  A  — »  {G  x  G)  assigns  a  binary  relation  on 
G  to  each  PDS  rule,  val  is  extended  to  a  sequence  of  rules  as  follows:  val([ri,  ■  ■  ■  ,rn\)  = 
val(r i);  val(r2 );  •  •  •  ;  val(rn).  For  a  rule  sequence  a  G  A*  and  PDS  configurations  c±  and  c2,  we 
say  Ci  c2  if  applying  those  rules  on  Ci  results  in  c2.  The  reachability  question  is  extended 
to  computing  the  join-over-all-paths  (JOP)  value  between  two  sets  of  configurations: 

JOP(Ci,  C2)  =  \^){val(cr)  |  Ci  c2,cx  G  Ci,  c2  G  C2} 

PDSs  and  SPDSs  have  equivalent  theoretical  power;  each  can  be  converted  to  the  other. 
SPDSs  are  used  for  efficiently  analyzing  PDSs.  For  a  PDS  V  =  (P,  T,  A),  one  constructs 
an  SPDS  as  follows:  it  consists  of  a  PDS  ({p},r,  A')  and  G  =  P.  The  rules  A'  and  their 
assigned  relations  are  defined  as  follows:  for  each  7  G  T,u  G  T*,  include  rule  (p,  7)  (p,  u) 

with  the  relation  {(pi,p2)  |  (pi,7)  (p2,w)  G  A},  if  the  relation  is  non-empty.  The  SPDS 

captures  all  state  changes  in  the  relations  associated  with  the  rules.  Under  this  conversion: 
(pi,«i)  =Up  (p2,u2)  if  and  only  if  (px,p2)  G  JOP({(p,  Ui)},  {(p,  u2)}). 

The  advantage  of  using  SPDSs  is  that  the  relations  can  be  encoded  using  BDDs,  and 
operations  such  as  relational  composition  and  union  can  be  performed  efficiently  using  BDD 
operations.  This  allows  scalability  to  large  data-state  spaces  [85].  (SPDSs  can  also  encode 
part  of  the  local  state  in  the  relations,  but  we  do  not  consider  that  issue  in  this  section.) 

The  reverse  construction  can  be  used  to  encode  an  SPDS  as  a  PDS:  given  an  SPDS 
(({p},  T,  A),  G,  val),  construct  a  PDS  V  =  (G,  T,  A')  with  rules:  {((71,7)  ^  ( g2,u )  \  r  = 

(Pi  7)  ^  {P,u),r  e  A,  (<71,  g2)  G  val(r)}.  Then  (gi,g2)  G  JOP({(p,  «i)},  {(p,  u2)})  if  and 
only  if  (pi,wi)  ( g2,u2 ). 

Context-bounded  analysis  of  concurrent  SPDSs 

A  concurrent  SPDS  with  two  threads  consists  of  two  SPDSs  =  (({p},  Ti,  Ai),  G,  vali) 
and  S-2  =  (({p},  Ti,  Ax),  G ,  val\)  with  the  same  set  G.  The  transition  relation  =^c=  (=r-x;  =^2 
)K ,  which  describes  all  paths  in  the  concurrent  PDS  for  2 K  —  1  context  switches,  is  defined 
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in  the  same  manner  as  for  PDS,  using  the  transition  relations  of  the  two  PDSs.  Let  (p,  e±,  e 2) 
be  the  starting  configuration  of  the  concurrent  SPDS.  The  problem  of  interest  is  to  compute 
the  following  relation  for  a  given  set  of  configurations  C: 

Rc  =  JOP ((p,  eu  e2),  C)  =  \J{val(a)  |  ( p ,  eu  e2)  =K(  c,  c  e  <?}• 

A  concurrent  SPDS  can  be  reduced  to  a  single  SPDS  using  the  constructions  presented 
earlier:  (i)  convert  the  SPDSs  St  to  PDSs  V, ;  (if)  convert  the  concurrent  PDS  system  (Vi,  V2) 
to  a  single  PDS  Vs\  and  (m)  convert  the  PDS  Vs  to  an  SPDS  Ss.  The  rules  of  Ss  will  have 
binary  relations  on  the  set  GK  (A'-fold  Cartesian  product  of  G).  Recall  that  the  rules  of  Vs 
change  the  global  state  in  only  one  component.  Thus,  the  BDDs  that  represent  the  relations 
of  rules  in  Ss  would  only  be  log(A')  times  larger  than  the  BDDs  for  relations  in  dp  and  S2 
(the  identity  relation  on  n  elements  can  be  represented  with  a  BDD  of  size  log(n)  [85]). 

Let  C'  =  {{p,e 3  U2  u\)  |  (p,ui,U2)  G  C}.  On  Ss,  one  can  solve  for  the  value  R  = 
JOP  ((p,ei),C').  Then  Rc  =  {( g,g ')  \  ((9,92,- ■■  ,  9k),  (92,  ■  •  ■  ,9k,  g'))  e  R}  (note  the 
similarity  to  Thm.  7.3.1). 

7.5  The  Reduction  for  WPDSs 

A  concurrent  WPDS  is  defined  as  a  set  of  WPDSs,  one  for  each  thread.  The  problem  of 
CBA  for  concurrent  WPDSs  was  defined  in  Section  6.1. 

In  this  section,  we  only  consider  concurrent  WPDSs  with  two  threads.  Moreover,  we 
restrict  each  WPDS  to  have  a  single  control  state  in  the  underlying  PDS.  A  WPDS  that 
does  not  satisfy  this  restriction  can  be  converted  to  one  that  does  satisfy  it  by  appropriately 
changing  the  weight  domain  (similar  to  the  conversion  from  a  PDS  to  a  symbolic  PDS). 

Let  the  concurrent  WPDS  be  (VVi .  Wb),  where  W \  =  (Vi,S,  /,;),  and  V,  =  ({p},  r*,  A,:). 
Let  K  be  the  number  of  execution  contexts  per  thread.  We  will  reduce  (Wl,  VV2)  to  a  single 
WPDS  Ws  over  a  different  weight  domain  using  the  tensor-product  operation.  Let  SK  be 
the  A'th-STP  (Defn.  6.5.1)  of  S.  For  weight  domains,  the  tensor-product  operation  serves 
the  same  role  as  the  duplication  of  shared  variables  that  was  used  in  Section  7.1  to  keep 
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For  each  rule  (p,  7)  (p,  u)  G  (Ax  U  A2)  with  weight  w  and  for  all  k  G  Nk' 

(k,  7)  (k,  u)  with  weight  ec (k,  w) 

For  each  7  G  Tj  and  for  all  k  G  Nk- 
(k,  7)  (k  +  1,7)  with  weight  1 

(K  +  1,7)  (1,  ej+i  7)  with  weight  I 


Figure  7.4  WPDS  rules  for  Ws. 

track  of  the  shared  state  at  each  context  switch.  The  De  Tensor  operation  will  serve  the  role 
of  the  Checker. 

Let  Nk  =  {1, 2,  •  •  •  ,K}.  The  WPDS  VV.S  is  defined  as  ( Vs,SKJs ),  where  Vs  = 
(Njr,  rs,  As).  (The  control  states  of  Vs  will  keep  track  of  the  current  execution  context.) 
Define  a  function  ec  :  Nk  x5->  Sk  as  follows: 

ec  (i,w)  =  0(1,  •••  ■  ■■  ,1) 

i- 1  K-i 

ec (i,w)  takes  the  tensor  of  K  weights,  where  w  appears  in  the  ith  position. 

Let  e,  G  T,:  be  the  start  node  of  the  ith  thread.  Let  be  the  disjoint  union  of  Ti,  T2  and 
an  additional  symbol  {63}.  VV,S  does  not  have  an  explicit  checking  phase.  The  rules  As  are 
defined  in  Fig.  7.4. 

As  in  Section  7.3,  we  assume  that  the  PDSs  V\  and  V2  cannot  empty  their  stacks,  i.e.,  it 
is  not  possible  that  (p,e  1)  =>*V]  (p,e)  or  (p,  e2)  (p,  e). 

Theorem  7.5.1.  In  the  concurrent  WPDS  (Wi,W2),  the  net  effect  of  all  paths 
that  go  from  (p,e i,e2)  to  (p,  ci,c2)  with  K  execution  contexts  per  thread  is  exactly 
DeTensor(IJOPws((l,  ei),  (1,  e3  c2  ci))),  where  the  DeTensor  operation  is  for  the  K-STP 
SK. 
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7.6  Lazy  CBA  of  Concurrent  Boolean  Programs 

In  the  reduction  presented  in  Section  7.2,  the  analysis  of  the  generated  sequential  program 
had  to  assume  all  possible  values  for  the  symbolic  constants.  The  lazy  analysis  has  the 
property  that  at  any  time,  if  the  analysis  considers  the  A"- tuple  (gi,  -  ■  ■  ,  gx)  of  valuations  of 
the  symbolic  constants,  then  there  is  at  least  one  valid  execution  of  the  concurrent  program 
in  which  the  global  state  is  g,  at  the  end  of  the  ith  execution  context  of  the  first  thread,  for 
all  1  <  i  <  K. 

The  idea  is  to  iteratively  build  up  the  effect  that  each  thread  can  have  on  the  global 
state  in  its  K  execution  contexts.  Note  that  T(  (or  T2S)  does  not  need  to  know  the  values 
of  VARg  when  i  >  k.  Hence,  the  analysis  proceeds  by  making  no  assumptions  on  the 
values  of  Var^  when  i  >  k.  When  k  is  incremented  to  k  +  1  in  the  analysis  of  T(,  it 
consults  a  table  E 2  that  stores  the  effect  that  T|  can  have  in  its  first  k  execution  contexts. 
Using  that  table,  it  figures  out  a  valuation  of  Var^+1  to  continue  the  analysis  of  Tjs,  and 
stores  the  effect  that  T(  can  have  in  its  first  k  execution  contexts  in  table  E1.  These 
tables  are  built  iteratively.  More  precisely,  if  the  analysis  can  deduce  that  T®,  when  started 
in  state  (1  ,gi,-'"  ,9k),  can  reach  the  state  (k,g[,-"  i9'k)->  and  when  started  in  state 
(1  ,g[,  ■  ■  ■  ,g'k)  can  reach  (k,g-2,g3,  ■  ■  ■  ,gk,9k+i),  then  an  increment  of  k  in  T[  produces  the 
global  state  s  —  (k  +  1 ,  g\ ,  •  •  •  ,g'k,  gk+i)-  Moreover,  s  can  be  reached  when  T(  is  started  in 
state  (1,  gi,  •  •  •  ,  gu+i)  because  Tf  could  not  have  touched  Var^+1  before  the  increment  that 
changed  k  to  k  +  1.  The  algorithm  is  shown  in  Fig.  7.5.  The  entities  used  in  it  have  the 
following  meanings: 

•  Let  G  =  U ^=1Gl,  where  G  is  the  set  of  global  states.  An  element  from  the  set  G  is 
written  as  g.  Let  L  be  the  set  of  local  states. 

•  The  relation  H^n  is  related  to  program  node  n  of  the  jth  thread.  It  is  a  subset  of 
G  x  {1,  •  •  •  ,  A"}  x  G  x  L  x  {1,  •  •  •  ,  A'}  x  G  x  L.  If  Hl  (g0l  ki,gi,l\,  k2,g2,  h)  holds,  then 
each  of  the  gi  are  an  element  of  Gk'2  (i.e.,  a  A^-tuple  of  global  states),  and  the  thread  T) 
is  in  its  k2th  execution  context.  Moreover,  if  the  valuation  of  Var^,  1  <  i  <  k2,  was  r/0 
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when  TJ  (the  reduction  of  Tj)  started  executing,  and  if  the  node  ep(n),  the  entry  node 
of  the  procedure  containing  n,  could  be  reached  in  data  state  then  n  can 

be  reached  in  data  state  ( fc2, 02,^2 ),  and  the  variables  Var^,  i  >  k2  are  not  touched 
(hence,  there  is  no  need  to  know  their  values).  Note  that  this  means  that  ep(n)  was 
reached  in  the  kfh  execution  context  of  Tj  and  n  was  reached  in  the  k2th  execution 
context. 

•  The  relation  S±  captures  the  summary  of  procedure  f . 

•  The  relations  E J  store  the  effect  of  executing  a  thread.  If  EJ(k,  go,  (j\  )  holds,  then 
r/0 ,  g i  G  Gk,  and  the  execution  of  thread  Tf,  starting  from  g0  can  lead  to  (J\ ,  without 
touching  variables  in  Var^,  i  >  k. 

•  The  function  check  (k,  (fji ,  ••  •  ,  gk),  ,  9k))  returns  g'k  if  gi+i  =  g\  for  1  <  i  <  k- 1, 

and  is  undefined  otherwise.  This  function  checks  for  the  correct  transfer  of  the  global 
state  from  T2  to  T\  at  a  context  switch. 

•  Let  [(ft,  •  •  •  ,0i),  (0i+i,  •  •  -gj)\  =  (01,  •  •  •  ,0j)-  We  sometimes  write  g  to  mean  ( g ),  i.e., 
[(^i,  •  •  •  ,9i),g]  =  (gi,  ■  ■  ■  ,9i,g)- 

Understanding  the  rules.  The  rules  1Z\ ,  1Z'2 ,  lZ':i ,  and  1Z'7  describe  intra-thread  computa¬ 
tion,  and  are  similar  to  the  corresponding  unprimed  rules  in  Fig.  7.2.  The  rule  77io  initializes 
the  variables  for  the  first  execution  context  of  T\.  The  rule  77i2  initializes  the  variables  for 
the  first  execution  context  of  T2.  The  rules  7 and  ensure  proper  hand-off  of  the  global 
state  from  one  thread  to  another.  These  two  are  the  only  rules  that  change  the  value  of 
k.  For  example,  consider  rule  7Zg.  It  ensures  that  the  global  state  at  the  end  of  the  k2th 
execution  context  of  T2  is  passed  to  the  (k2  +  l)th  execution  context  of  Tf,  using  the  function 
check.  The  value  g  returned  by  this  function  represents  a  reachable  valuation  of  the  global 
variables  when  T\  starts  its  (k2  +  l)th  execution  context. 

The  following  theorem  shows  that  the  relations  E 1  and  E 2  are  built  lazily,  i.e.,  they  only 
contain  relevant  information.  A  proof  is  given  in  Section  7.9. 
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Figure  7.5  Rules  for  lazy  analysis  of  concurrent  Boolean  programs  with  two  threads. 


Theorem  7.6.1.  After  running  the  algorithm  described  in  Fig.  7.5, 
E1(k,(gir--,gk),(g,1,---,g'k))  and  E2(k,  (g[,  ■  ■  •  ,  g'k),  (g2,  *  •  •  ,  gk,  g))  hold  if  and  only 
if  there  is  an  execution  of  the  concurrent  program  with  2k  —  1  context  switches  that  starts  in 
state  g\  and  ends  in  state  g,  and  the  global  state  is  gi  at  the  start  of  the  ith  execution  context 
of  T\  and  g[  at  the  start  of  the  ith  execution  context  ofT2.  The  set  of  reachable  global  states 
of  the  program  in  2 K  —  1  context  switches  are  all  g  G  G  such  that  E2(K,gi,  [g2 ,  g] )  holds. 

Multiple  threads.  In  the  presence  of  multiple  threads,  we  fix  round-robin  scheduling,  and 
impose  a  bound  K  on  the  number  of  execution  contexts  per  thread. 

The  analysis  rules  remain  similar  to  the  ones  for  two  threads,  with  El  relations  summa¬ 
rizing  the  behavior  of  the  ith  thread.  The  only  difference  is  the  following:  in  the  presence 
of  two  threads,  for  a  thread,  say  Tb  one  only  needs  to  consult  E2  to  find  the  global  state 
for  the  next  execution  context  (rule  Eg).  In  the  presence  of  r  threads,  r  >  2,  for  a  thread 
Tj,  one  needs  to  consult  each  of  Et+1,  El+ 2,  •  •  •  ,Er,E1,---  ,  E *_1,  in  order.  For  this,  we 
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1  <k<K  g  £  Gk 

EIfter  (fc-9,9) 


72l3 


El(k,go,gi)  Elftel(k,gug2) 
E^lAk’90,92) 


1  <  i  <  r 

- 72-15 


Ebefore(fc>90,9i)  Ei(k,g1,g2) 

EbtLe(k’90,92 ) 


1  <  i  <  r 

- 72i6 


l<k<K  g  £Gk 

Ebefor  e(fe.9>9) 


Ebefore(1>90,9l)^  £  L,e  =  entry (T;) 

#e  (91,1, 9ld,  1, 91 ,0 


72i7 


Hm(90,kl,gi,ll,k2,g2,te)  E*fter(fc2, 92, 93)  Ebefore(fc2  +  !.  [94,93],  [90,9]) 
77m([90,9],7ci,  [gi,9],/l,fc2  +  1,  [92, 9], *2) 


72i8 


Figure  7.6  Rules  for  lazy  analysis  of  concurrent  Boolean  programs  with  r  threads. 

build  relations  -E*fter  and  77^efore  that  summarize  the  effect  of  Tl+  \ ,  •  •  •  ,  Tr  and  Ti,  •  •  •  ,  Tj_i, 
respectively. 

The  analysis  rules  for  multiple  threads  include  1Z\ ,  TZ'2, 1Z'3, 1Z'7,  and  77n  from  Fig.  7.5.  The 
rest  of  the  rules  are  shown  in  Fig.  7.6.  Rules  77i3  and  77i4  initialize  -Softer  and  Ele{ore  to  the 
identity  relation,  respectively.  Rules  77-15  and  77i6  compute  these  relations  compositionally. 
Rule  77i7  generalizes  rules  77i0  and  77i2  of  Fig.  7.5.  Rule  77i8  generalizes  rules  778  and  779  of 
Fig.  7.5.  Note  that  the  use  of  check  in  778  is  made  implicitly  in  77i8.  For  instance,  consider 
the  case  when  i  —  1  in  77i8.  Then  Elefore  is  the  identity  relation  on  the  global-state  vectors. 
Thus,  [94,  fh]  =  [g0 ,g],  which  implies  that  g  =  check(g0,  g3). 

7.7  Experiments 

We  implemented  both  the  lazy  and  eager  analyses  for  concurrent  Boolean  programs 
by  extending  the  model  checker  Moped  [50].  These  implementations  find  the  set  of  all 
reachable  states  of  the  shared  memory  after  a  given  number  of  context  switches.  We  could 
have  implemented  the  eager  version  using  a  source-to-source  transformation;  however,  we 
took  a  different  approach  because  it  allows  us  to  switch  easily  between  the  lazy  and  eager 
versions.  Both  versions  are  based  on  the  rules  shown  in  Fig.  7.5. 
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In  the  lazy  version,  the  rules  are  applied  in  the  following  order:  (i)  The  H  relations 
are  saturated  for  execution  context  k]  (■ ii )  Then  the  E  relations  are  computed  for  k ;  (in) 
then  rules  Tig  and  779  are  used  to  initialize  the  H  relations  for  execution  context  k  +  1  and 
the  process  is  repeated.  In  this  way,  the  first  step  can  be  performed  using  the  standard 
(sequential)  reachability  algorithm  of  Moped.  Thm.  7.6.1  allows  us  to  find  the  reachable 
states  directly  from  the  E  relations. 

The  eager  version  is  implemented  in  a  similar  fashion,  except  that  it  uses  a  fixed  set  of  E 
relations  that  include  all  possible  global  state  changes.  Once  the  H  relations  are  computed, 
as  described  above,  then  the  E  relations  are  reinitialized  using  rule  TZn.  Next,  the  following 
rule,  which  encodes  the  Checker  phase,  computes  the  set  of  reachable  states  (assuming  that 
K  is  the  given  bound  on  the  number  of  execution  contexts). 

El(K,  go,  (J\)  E2(K,g h,g2)  g  =  check(g0,  g2) 

- ReachaUe(g) - CheCker 

Our  implementation  supports  any  number  of  threads.  It  uses  round-robin  scheduling 
with  a  bound  on  the  number  of  execution  context  per  thread,  as  described  in  Section  7.1.2. 

All  of  our  experiments,  discussed  below,  were  performed  on  a  2.4GHz  machine  with 
3.4GB  RAM  running  Linux  version  2.6. 18-92. 1.17.el5. 

BlueTooth  driver  model.  First,  we  report  the  results  for  a  model  of  the  BlueTooth 
driver,  which  has  been  used  in  several  past  studies  [78,  19,  89].  The  driver  model  can  have 
multiple  threads,  where  each  thread  requests  the  addition  or  the  removal  of  devices  from  the 
system,  and  checks  to  see  if  a  user-defined  assertion  can  fail.  We  used  this  model  to  test 
the  scalability  of  our  tool  with  respect  to  the  number  of  threads,  as  well  as  the  number  of 
execution  contexts  per  thread.  The  results  are  shown  in  Fig.  7.7.  The  model  has  8  shared 
global  variables,  at  most  7  local  variables  per  procedure,  5  procedures  but  no  recursion,  and 
37  program  statements. 

It  is  interesting  to  note  that  the  eager  analysis  is  faster  than  the  lazy  analysis  in  some 
cases  (when  there  are  a  large  number  of  threads  or  execution  contexts).  The  running  times 
for  symbolic  techniques  need  not  be  proportional  to  the  number  of  states  explored:  even 
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i-Eager 
-Lazy 


i-Eager 

^Lazy 


(6) 


Figure  7.7  Experiments  with  the  BlueTooth  driver  model.  Each  thread  tries  to  either  start 
or  stop  the  device,  (a)  Running  time  when  the  number  of  execution  contexts  per  thread  is 
fixed  at  4.  (6)  Running  time  when  the  number  of  threads  is  fixed  at  3. 


when  the  eager  analysis  explores  more  behaviors  than  the  lazy  version,  its  running  time  is 
shorter  because  it  is  able  to  exploit  more  symmetry  in  the  search  space,  and  the  resulting 
BDDs  are  small. 

The  graph  in  Fig.  7.7(6)  shows  the  exponential  dependence  of  the  running  time  on  the 
number  of  execution  contexts.  The  graph  in  Fig.  7.7(a)  shows  the  expected  linear  dependence 
of  the  running  time  on  the  number  of  threads,  until  the  number  of  threads  is  8.  We  believe 
that  the  sharp  increase  is  due  to  BDDs  getting  large  enough  so  that  operations  on  them  do 
not  entirely  fit  inside  the  BDD-caehe. 

Binary  search  tree.  We  also  measured  the  performance  of  our  techniques  on  a  model 
of  a  concurrent  binary  search  tree,  which  was  also  used  in  [89].  (Because  our  model  was 
hand-coded,  and  the  model  used  in  [89]  was  automatically  extracted  from  Java  code,  our 
results  are  not  directly  comparable.)  This  model  has  a  finite  heap,  and  a  thread  either  tries 
to  insert  a  value,  or  search  for  it  in  the  tree.  The  model  has  72  shared  global  variables, 
at  most  52  local  variables  per  procedure,  15  procedures,  and  155  program  statements.  The 
model  uses  recursion. 
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Threads 

Execution  contexts  per  thread 

Inserters 

Searchers 

2 

3 

4 

5 

6 

1 

1 

6.1 

21.6 

84.5 

314.8 

1054.8 

2 

1 

11.9 

46.8 

211.9 

832.0 

2995.6 

2 

2 

14.1 

64.4 

298.0 

1255.4 

4432.1 

Figure  7.8  Lazy  context-bounded  analysis  of  the  binary  search  tree  model.  The  table 
reports  the  running  time  for  various  configurations  in  seconds. 

The  eager  version  of  the  algorithm  timed  out  on  this  model  for  most  settings.  This  may 
be  because  the  analysis  has  to  consider  symbolic  operations  on  the  heap,  which  results  in 
huge  BDDs.  The  results  for  the  lazy  version  are  reported  in  Fig.  7.8.  They  show  trends 
similar  to  the  BlueTooth  driver  model:  a  linear  increase  in  running  time  according  to  the 
number  of  threads,  and  an  exponential  increase  in  running  time  according  to  the  number  of 
execution  contexts  per  thread. 

BEEM  benchmark  suite.  The  third  set  of  experiments  consisted  of  common  concurrent 
algorithms,  for  which  finite,  non-recursive  models  were  obtained  from  the  BEEM  benchmark 
suite  [73].  We  hand-translated  some  of  the  SPIN  models  into  the  input  language  of  MOPED. 
These  models  do  not  exploit  the  full  capabilities  of  our  tool  because  they  all  have  a  single 
procedure.  We  use  these  models  for  a  more  comprehensive  evaluation  of  our  tool.  All 
examples  that  we  picked  use  a  large  number  of  threads.  As  before,  the  eager  version  timed 
out  for  most  settings,  and  we  report  the  results  for  the  lazy  version. 

The  benchmark  suite  also  has  buggy  versions  of  each  of  the  test  examples.  The  bugs 
were  introduced  by  perturbing  the  constants  in  the  correct  version  by  ±1  or  by  changing 
comparison  operators  (e.g.,  >  to  >,  or  vice  versa).  Interestingly,  the  bugs  were  found  within 
a  budget  of  2  or  3  execution  contexts  per  thread.  (Note  that  this  may  still  involve  multiple 
context  switches.) 
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The  results  are  reported  in  Fig.  7.9.  To  put  the  numbers  in  perspective,  we  also  give  the 
time  required  by  SPIN  to  enumerate  all  reachable  states  of  the  program.  These  are  finite- 
state  models,  meant  for  explicit-state  model  checkers;  however,  Spin  ran  out  of  memory  on 
three  of  the  eight  examples. 

The  CBA  techniques  presented  in  this  chapter,  unlike  explicit-state  model  checkers,  do 
not  look  for  repetition  of  states:  if  a  state  has  been  reached  within  k  context  switches,  then 
it  need  not  be  considered  again  if  it  shows  up  after  k  +  %  context  switches.  In  general,  for 
recursive  programs,  this  is  hard  to  check  because  the  number  of  states  that  can  arise  after  a 
context  switch  may  be  infinite.  However,  it  would  still  be  interesting  to  explore  techniques 
that  can  rule  out  some  repetitions. 

Predicate-abstraction  based  Boolean  programs.  Our  fourth  set  of  experiments  use 
the  concurrent  Boolean  programs  generated  using  the  predicate  abstraction  performed  by 
DDVERIFY  [96].  We  use  this  set  of  experiments  to  validate  two  hypotheses:  first,  most  bugs 
manifest  themselves  in  a  few  context  switches;  second,  our  tool,  based  on  CBA,  remains 
competitive  with  current  verification  tools  when  it  is  given  a  reasonable  bound  on  the  number 
of  context  switches. 

First,  we  briefly  describe  how  DDVerify  operates.  When  given  C  source  code, 
DDVerify  performs  predicate  abstraction  to  produce  an  abstract  model  of  the  original 
program.  This  model  is  written  out  as  a  concurrent  Boolean  program  and  fed  to  the  model 
checker  Boppo,  or  in  the  input  language  of  Smv  and  fed  to  Smv.  DDVerify  uses  Smv 
by  default  because  it  performs  better  then  Boppo  on  concurrent  models  [96].  If  the  model 
checker  is  able  to  prove  the  correctness  of  all  assertions  in  the  model,  the  entire  process 
succeeds  (no  bugs).  If  the  model  checker  returns  a  counterexample,  then  it  is  checked  con¬ 
cretely,  and  if  it  is  spurious,  then  the  abstraction  is  refined  to  create  a  new  abstract  model 
and  this  repeats.  DDVERIFY  checks  for  a  number  of  different  properties  on  the  source  code 
separately.  (The  abstract  models  produced  by  DDVerify  have  a  single  procedure  and  very 
few  local  variables.  Thus,  these  experiments  do  not  exploit  the  full  capabilities  of  CBA  as 
well.) 


205 


Name 

Inst 

#gvars 

#lvars 

^Threads 

#EC 

Time  (s) 

Spin  (s) 

Anderson 

N = 6 ,  ERRO R = 0 

pos 

11 

4 

6 

2 

52.46 

OOM 

Anderson 

N = 6 ,  ERRO  R = 1 

neg 

11 

4 

6 

2 

54.90 

OOM 

Bakery 

N=4,MAX=7 

pos 

17 

7 

4 

2 

5.87 

28.5 

Bakery 

N=4,MAX=5 

neg 

17 

7 

4 

2 

13.88 

44.2 

Peterson 

N=4 

pos 

25 

7 

4 

3 

5.46 

3.05 

Peterson 

N=4,ERROR=l 

neg 

25 

7 

4 

3 

25.72 

OOM 

Msmie 

N=5,S=10,M=10 

pos 

23 

1 

20 

2 

47.94 

31.0 

Msmie 

N=5,S=10,M=10 

neg 

13 

1 

13 

2 

1.29 

1.04 

Figure  7.9  Experiments  on  finite-state  models  obtained  from  the  BEEM  benchmark  suite. 
The  names,  along  with  the  given  parameter  values  uniquely  identify  the  program  in  the 
test  suite.  The  columns,  in  order,  report:  the  name;  buggy  (neg)  or  correct  (pos)  version; 
number  of  shared  variables;  number  of  local  variables  per  thread;  number  of  threads; 
execution  context  budget  per  thread;  running  time  of  our  tool  in  seconds;  and  the  time 
needed  by  Spin  to  enumerate  the  entire  state  space.  “OOM”  stands  for  Out-Of-Memory. 

We  now  describe  the  experimental  setup.  We  chose  6  drivers  among  the  ones  provided 
with  the  distribution  of  DD VERIFY.  For  each  driver,  we  chose  some  properties  at  random 
and  let  DDVerify  run  normally  using  Smv  as  its  model  checker,  but  we  saved  the  Boolean 
programs  that  it  produced  at  each  iteration.  For  each  driver  and  each  property,  we  collected 
the  Smv  hies  and  the  Boolean  programs  produced  during  the  last  iteration.  The  experiments 
were  conducted  on  these  hies.  We  gave  our  tool  a  budget  of  2  threads  and  4  execution  contexts 


206 


CBA 


Figure  7.10  Scatter  plot  of  the  running  times  of  onr  tool  (CBA)  against  Smv  on  the  hies 
obtained  from  DD Verify.  Different  dots  are  used  for  the  cases  when  the  hies  had  a  bug 
(neg)  and  when  they  did  not  have  a  bug  (pos).  For  the  “neg”  dots,  the  number  of  context 
switches  before  a  bug  was  found  is  shown  alongside  the  dot.  The  median  speedup  was 
about  30 x.  Lines  indicating  lx  and  30 x  speedups  are  also  shown  as  dashed  and  dotted 

lines,  respectively. 


per  thread.  A  scatter  plot  of  the  running  times  is  shown  in  Fig.  7.10  and  the  aggregate  times 
for  proving  all  chosen  properties  for  a  given  driver  are  reported  in  Fig.  7.11. 

Two  things  should  be  noted  from  the  results.  First,  whenever  a  model  was  buggy,  our  tool 
could  find  it  within  the  budget  given  to  it.  This  validates  our  hypothesis  that  bugs  manifest 
in  few  context  switches.  Second,  our  tool  was  much  faster  than  Smv  on  these  benchmarks, 
with  speedups  of  up  to  120  x;  our  tool  was  slower  on  only  one  example.  As  shown  by  the 
dotted  line  in  Fig.  7.10,  the  median  speedup  was  about  30x. 

7.8  Related  Work 

A  reduction  from  concurrent  programs  to  sequential  programs  was  given  in  [78]  for  the 
case  of  two  threads  and  two  context  switches  (it  has  a  restricted  extension  to  multiple  threads 
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Name 

Inst 

Time  (s) 

Smv  (s) 

Speedup 

#cs 

applicom 

pos 

1.3 

147.1 

117.7 

neg 

33.0 

147.1 

15.1 

[1.3] 

generic_nvram 

pos 

2.3 

88.1 

38.3 

gpio 

pos 

280.6 

92.8 

0.33 

neg 

106.8 

299.6 

2.8 

[1.6] 

machzwd 

pos 

0.9 

103.4 

120.2 

neg 

85.6 

4214.7 

49.2 

[0,6] 

nwbutton 

pos 

0.19 

2.8 

14.6 

neg 

0.88 

26.1 

29.7 

[1,6] 

toshiba 

pos 

3.8 

197.1 

52.4 

neg 

192.8 

243.43 

1.3 

[1,6] 

Figure  7.11  Experiments  on  concurrent  Boolean  programs  obtained  from  DDVERIFY.  The 
columns,  in  order,  report:  the  name  of  the  driver;  buggy  (neg)  or  correct  (pos)  version,  as 
determined  by  SMV;  running  time  of  our  tool  in  seconds;  the  running  time  of  Smv;  speedup 
of  our  tool  against  Smv;  and  the  range  of  the  number  of  context  switches  after  which  a  bug 
was  found.  Each  row  summarizes  the  time  needed  for  checking  multiple  properties. 

as  well).  In  such  a  case,  the  only  thread  interleaving  is  T\ ;  T2;  T\ .  The  context  switch  from  T\ 
to  T2  is  simulated  by  a  procedure  call.  Then  T2  is  executed  on  the  program  stack  of  Tf,  and 
at  the  next  context  switch,  the  stack  of  T2  is  popped  off  to  resume  execution  in  Tf.  Because 
the  stack  of  T2  is  destroyed,  the  analysis  cannot  return  to  T2  (hence  the  context  bound  of 
2).  Their  algorithm  cannot  be  generalized  to  an  arbitrary  context  bound. 

A  symbolic  algorithm  for  context-bounded  analysis  was  presented  recently  by  Suwimon- 
teerabuth  et  al.  [89].  An  earlier  algorithm  by  Qadeer  and  Rehof  [77]  required  enumeration 
of  all  reachable  global  states  at  a  context  switch.  Suwimonteerabuth  et  al.  identify  places 
where  such  an  enumeration  is  not  required,  essentially  by  finding  different  abstract  states 
that  the  program  model  cannot  distinguish.  This  enables  symbolic  computation  to  some 
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extent.  However,  in  the  worst  case,  the  algorithm  still  requires  enumeration  of  all  reachable 
states. 

Analysis  of  message-passing  concurrent  systems,  as  opposed  to  ones  having  shared  mem¬ 
ory,  has  been  considered  in  [19].  They  bound  the  number  of  messages  that  can  be  commu¬ 
nicated,  similar  to  bounding  the  number  of  contexts. 

There  has  been  a  large  body  of  work  on  verification  of  concurrent  programs.  Some 
recent  work  is  [38,  76].  However,  CBA  is  different  because  it  allows  for  precise  analysis  of 
complicated  program  models,  including  recursion.  As  future  work,  it  would  be  interesting 
to  explore  CBA  with  the  abstractions  used  in  the  aforementioned  work. 

7.9  Proofs 

7.9.1  Proof  of  Thm.  7.3.1 

(4=)  First,  we  show  that  a  path  of  the  concurrent  program  can  be  simulated  by  a 
path  in  the  sequential  program.  (In  this  proof,  we  will  deviate  from  the  notation  of  the 
theorem  to  make  the  proof  more  clear.)  Let  Co  =  e\,  and  do  =  e2-  If  the  configura¬ 
tion  (po,co,d0)  can  lead  to  (p2K,  cr,  dx)  under  the  transition  system  (=*>*;  =^2)^  then  we 
show  that  there  exist  states  p2,  An  •  •  •  -,P2K-2  £  P  such  that  ((l,p0,p2,  •  •  •  ,P2K-2),  0)) 
((l,p2,P4,  •  •  •  iP2i<)i  e3  dK  cK)- 

If  a  sequence  of  rules  cr  take  a  configuration  c  to  a  configuration  d  under  the  tran¬ 
sition  system  =^,  then  we  say  c  d.  For  a  rule  r  e  Aj,  r  =  (p,  7)  ( p',u ),  let 

ra[k,pir"  ,Pk-i,Pk+i,--'PK]  e  As  be  the  rule  {(k,pu  ■  ■  ■  ,Pk-i,l/,Pk+i,  ’  ’  ■  ,Pk),  7)  ^ 
{(k,pi,--  -  ,pk-i,p',Pk+i,  ■  ■  ■  ,Pk),u).  We  extend  this  notation  to  rule  sequences  as  well, 
and  drop  the  pll  when  they  are  clear  from  the  configuration  the  rules  are  applied  on.  Let 
v inc  \k\  stand  for  a  rule  of  Vs  that  increments  the  value  of  k  (note  that  it  can  hre  with  anything 
on  the  top  of  the  stack).  Let  r i_>2  stand  for  the  rules  that  call  from  the  first  PDS  to  the 
second,  and  r'2^3  stand  for  the  rules  that  call  e3. 

A  path  in  can  t16  broken  down  at  each  switch  from  to  and  from 

=^2  to  =^1.  Hence,  there  must  exist  C;,dj,  1  <  i  <  K  —  1;  pj ,  1  <  j  <  2K  —  1;  and  ah, 
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Figure  7.12  Simulation  of  a  concurrent  PDS  run  by  a  single  PDS.  For  clarity,  we  write  =>■ 

to  mean  =>-pa  in  (5). 

1  <  h  <  2 K,  such  that  a  path  in  the  concurrent  program  can  be  broken  down  as  shown  in 
Fig.  7.12(a).  Then  the  path  shown  in  Fig.  7.12(6)  is  a  valid  run  of  Vs  that  establishes  the 
required  property. 

(=>)  For  the  reverse  direction,  a  path  o  in  =>-ps,  from  ((l,p0,p2,  •  •  •  ,P2K-2),  c0)  to 
( ( 1 ,  P2 ,  Pa  ,  •  •  •  ,P2k),  63  dx  ck)  can  be  broken  down  as  a  =  a  a  &b  '^2^3-  (This  is  because 
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one  must  use  the  rules  ri_>2  and  r2_> 3,  in  order,  to  push  e 3  on  the  stack,  after  which  no  rules 
can  fire.)  Hence  we  must  have  the  following  (for  some  states  pi,P3,  ■  ■  ■  ,P2K-i)- 


((1,P0,P2W  ,P2K-2),C0)  ^°pAa  ((K  +  l,Pi,P3,-"  ,P2K-l),CK) 

^vT2  (( 1  - /H- Mid'  •  •  ,P2K-i),  do  CK) 

((K  +  1,  p-2,  p4-  ■  ■  ■  ,  P2I<),  dK  CK) 

=>%*'' 3  ((!, P2,  Pa,  ■  ■  ■  , P2k),  e3  dK  CK) 

Because  a  a  changes  the  value  of  k  from  1  to  K  +  1,  it  must  have  K  +  1  uses  of  rinc. 
Hence,  it  can  be  written  as:  a  a  =  crf[l]  rinc[l]  cr|  [2]  r;nc[2]  ■■■rmc[K  —  1]  <rf2K_l  [K]  rinc  [K], 
Because  only  crs[i]  can  change  the  ith  state  component,  we  must  have  the  following: 


((1,P0,P2W  ,P2K-2),C0) 


((1)  Pi,  P2,  ?  P2K—2) ,  Cl) 
((2,Pi,p2,  •  •  •  5  P2K—2) ,  Ci) 


K 1  [^"]  /  /  T /"  \  \ 

^n„c[x]  ((X  4.  l,p1,p3,  •  •  •  ,P2K-i),ck) 

Similarly,  oB  =  erf[l]  rinc[l]  cr|[2]  rinc[2]  •  •  •  rmc[K  -  1]  af2K[K]  rinc[K].  The  reader  can 
verify  that  the  rule  sequence  oq  cr2  •  •  •  <?2K- 1  <72 /r  describes  a  path  in  (=^*;  =y?;)A  and  takes 
the  configuration  ( pQ,co,do )  to  (p2K,CK,dx)- 


7.9.2  Complexity  argument  for  Thm.  7.3.1 

A  PDS  can  have  infinite  number  of  configurations.  Hence,  sets  of  configurations  are 
represented  using  automata  [85].  We  do  not  go  into  the  details  of  such  automata,  but 
only  present  the  running-time  complexity  arguments.  Given  an  automata  A,  and  a  PDS 
(Pin,  Tin,  Ain),  the  set  of  configurations  forward  reachable  from  those  represented  by  A  can 
be  calculated  in  time  (P(|Pin||Ain|(|Q|  +  |Pin| |Procin|)  +  |Pin||  — < >a  I),  where  Q  is  the  set  of 
states  of  A,  and  — >a  is  the  set  of  its  transitions  [85] .  We  call  the  algorithm  from  [85]  poststar, 
and  its  output,  which  is  also  an  automaton,  poststar(A) . 
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For  the  PDS  Vs,  obtained  from  a  concurrent  PDS  with  n  threads  (Vi,V2,---  , Vn), 
|PS|  =  K\P\K,  |As|  =  K\P\K~l\l±.\,\Procs\  =  \Proc\,  where  A  =  U"=1Aj  and  \Proc\  = 
Y^i= i  \ProCi\.  To  obtain  the  set  of  forward  reachable  configurations  from  (p,e  1,e2,  ■  ■  ■  ,  en), 
we  will  solve  poststar(A)  for  each  A  that  represents  the  singleton  set  of  configurations 
{((1,P,P2,  ■  ■  ■  iPk)->  ci)};  i.e. ,  |P|A_1  separate  calls  to  poststar.  In  the  result,  we  can  project 
out  all  configurations  that  do  not  have  (l,p2,  •  •  •  ,Pk,p')  as  their  state,  for  some  p'.  Directly 
using  the  above  complexity  result,  we  get  a  total  running  time  of  0(K’\P |4A  |A| \Proc\).  For 
the  case  of  two  threads,  we  use  a  more  sophisticated  argument  to  calculate  the  running  time. 

When  asking  for  the  set  of  reachable  configurations  of  Vs,  we  are  only  interested  in 
some  particular  configurations:  when  starting  from  ((l,p,p2,  ■  ■  ■  ,Pk)a i)>  we  only  want 
configurations  of  the  form  ((l,p2,-»-  ,Pk,p'),u).  Hence,  when  we  run  poststar,  starting 
from  the  above  configuration,  we  remove  some  rules  from  As:  we  remove  all  rules  with  left- 
hand  side  {(k,p'2,p3,  ■  ■  ■  ,p'K,p'),  7)  if  7  G  r2  and  pt  7^  p(  for  some  i  between  1  and  k  —  1, 
both  inclusive.  We  statically  know  that  removing  such  rules  would  not  affect  the  result. 

Further,  we  make  two  observations  about  the  algorithm  from  [85]:  (i)  if  an  automaton 
A  is  split  into  two  automata  Ai  and  A2,  such  that  the  union  of  the  transitions  (represented 
configurations)  of  A\  and  A2  equals  the  set  of  transitions  (represented  configurations)  of  A, 
then  the  running  time  of  poststar(A)  is  strictly  smaller  than  than  the  sum  of  the  running 
times  of  poststar(Ai)  and  poststar(A2) ■  ( ii )  splitting  the  set  of  PDS  rules  A  into  two  (Ai 
and  A2)  such  that  no  rule  in  Ai  can  fire  after  a  rule  of  A2  is  applied,  then  the  running  time  of 
poststar A,XpoststarAi  (*4))  is  the  same  as  the  running  time  of  poststar a(A) ,  where  the  poststar 
algorithm  is  subscripted  with  the  set  of  rules  it  operates  on.  Using  these  two  observations, 
we  show  that  running  poststar  using  Vs  takes  less  time  than  the  above-mentioned  complexity. 

Let  A*  C  As  be  the  set  of  rules  that  operate  when  the  first  component  of  the  state 
(the  value  of  k)  is  i,  and  Acau  C  As  be  the  set  of  rules  that  call  to  e2  (from  Tx) 
or  e3.  We  know  that  any  path  in  Vs  can  be  decomposed  into  a  rule  sequence  from 
S  =  A1*  A2*  ■  ■  ■  AA  Acan  A1*  A2*  ■■■  AK*  Acau.  Using  observation  (ii)  above,  we  break 
the  running  of  poststar  on  As  into  a  series  operating  on  each  of  the  above  sets,  in  order. 
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Iter 

Num 

|-| 

Time 

Split 

\Q\ 

1 

1 

1 

\P\2\A\\Proc\ 

1^1 

\P\\Proc\ 

2 

1^1 

P  A  Proc 

2\P\2\A\\Proc\ 

1^1 

2\P\\Proc\ 

i 

IH"1 

(i  —  l)\P\\A\\Proc\ 

2 (i  -  l)|P|2|A||Proc| 

1^1 

i\P\\Proc\ 

K 

\p\K-l 

(K  -  l)\P\\A\\Proc\ 

2(K  -  1)|P|2 1  A|| Proc\ 

1^1 

K\P\\Proc\ 

Table  7.1  Running  times  for  different  stages  of  poststar  on  Vs. 

Next,  after  running  poststar  on  one  of  A*  ,  we  split  the  resultant  automaton  A  into  as  many 
automata  as  the  number  of  states  in  the  configurations  of  A,  e.g.,  if  A  represents  the  set 
{(pi,  Ci),  (p2,  c2),  (p2,  C3)},  then  we  split  it  into  two  automata  representing  the  sets  {(pi,  Ci)} 
and  {(p2,  c2),  (p2,  C3)},  respectively.  Observation  (i)  shows  that  this  splitting  only  increases 
the  running  time. 

Tab.  7.1  shows  the  running  time  for  performing  poststar  on  the  first  K  of  the  A*  from  S. 
The  column  “Iter”  shows  which  A*  is  being  processed.  The  column  “Num”  is  the  number 
of  poststar  that  have  to  be  run  using  Ah  The  column  “|  — >  |”  shows  the  upper  bound  on 
the  number  of  transitions  in  the  automaton  poststar  is  run  on.  The  column  “Time”  is  the 
running  time  of  poststar  on  such  automata.  The  column  “Split”  is  an  upper  bound  on  the 
the  number  of  automata  the  result  is  split  into,  and  the  last  column  in  the  number  of  states 
in  each  of  the  resultant  automata.  For  example,  there  are  |P|*_1  number  of  invocations  to 
poststar  with  rule  set  A*,  each  on  an  automata  with  at  most  (i  —  l)|P||A||Proc|  transitions, 
taking  time  2 (i  —  l)|P|2|A||Proc|.  Each  result  is  split  into  |P|  different  automata,  each  with 
at  most  i|P||Proc|  states.  The  reader  can  inductively  verify  the  correctness  of  the  table. 

Thus,  this  requires  a  total  running  time  of  0(K\P\K+1\ A| \Proc\).  Next,  we  use  the  rules 
in  Acan  and  repeat  the  above  process  for  the  last  K  of  the  sequence  S.  However,  in  this  case, 
no  splitting  is  necessary,  because  we  know  the  desired  target  state,  and  have  already  removed 
some  rules  from  As.  For  example,  if  the  initial  state  chosen  was  (1  ,p,p2,  •  •  •  ,Pk),  and  after 
performing  the  computation  of  Tab.  7.1,  we  obtain  an  automaton  A  that  has  the  single  state 


213 


(1  ,p[,  ■  ■  ■  . p'K )  for  all  configurations  represented  by  it.  After  processing  A  with  A1  suppose 
the  result  is  A! .  There  is  no  need  to  split  A1  because  of  the  rules  removed  from  A2.  The 
rules  of  A2  would  only  fire  on  configurations  that  have  the  state  {2,P2,P2,P3, ' ' '  ,p’k)-  Thus, 
splitting  is  not  necessary,  and  the  time  required  to  process  each  of  the  |P|A_1  automata 
obtained  from  Tab.  7.1  using  A*  is  2 (K  +  %  —  l)|P|2|A||Proc|.  Hence,  the  time  required 
to  process  the  entire  S  is  0(K2\P\K+1\A\\Proc\).  Because  we  have  to  repeat  for  |P|A_1 
initial  states,  the  running  time  of  poststar  on  Vs  with  two  threads  can  be  bounded  by 
0(K2\P\2K\A\\Proc\). 

Backward  analysis  from  a  set  of  configurations  represented  by  an  automaton  A  with  \Q\ 
states  can  be  performed  in  time  0(K\P\2K (K\P\k  +  |<5|)2|A|)  for  multiple  threads,  and 
0(K\P\2K  (K\P\  +  |Q|)2|A|)  for  two  threads. 

7.9.3  Proof  of  Thm.  7.6.1 

For  proving  Thm.  7.6.1,  we  will  make  use  of  the  fact  that  our  reduction  to  a  (sequential) 
Boolean  program  is  correct.  Let  Tf  be  the  reduction  of  the  first  thread,  and  be  the 
reduction  of  the  second  thread.  First,  we  show  that  given  an  execution  p  of  Tjs,  and  certain 
facts  about  E2  (which  summarizes  the  effect  of  the  second  thread),  p  can  be  simulated  by 
the  subset  of  rules  from  Fig.  7.5  that  apply  to  the  first  thread.  Formally,  suppose  that  p  is 
the  execution  shown  in  Fig.  7.13  (where  no  is  the  entry  point  of  the  thread). 

The  execution  p  is  broken  at  the  points  where  the  value  of  k  is  incremented.  Note  that 
this  execution  implies  that  in  the  concurrent  program  the  global  state,  when  T)  begins  its 
ith  execution  context,  is  gt,  and  when  P2  begins  its  ith  execution  context,  it  is  g\.  Further, 
suppose  that  the  following  facts  hold:  E2(i,  (g[,  g'2,  ■  ■  ■  ,  g^),  (g2,  <7,3,  •  •  •  ,  <&))  for  1  <  i  <  k  —  1. 
Given  these,  we  will  show  that  rules  for  the  first  thread  can  be  used  to  establish  that 
Hhk((9i,  ■■■,9k),  h,  9,  l,  k,  (g[,  ■  ■  ■  ,  g'k),  V)  holds,  for  some  ku  g,  l  and  V. 

Corresponding  to  the  execution  p,  there  would  be  a  sequence  of  deductions,  using  the 
rules  from  Fig.  7.2  on  T(  that  derives  the  state  at  nk-  These  rules  simply  perform  an 
interprocedural  analysis  on  (the  symbolic  constants  can  take  any  value  when  program 
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,9k,9k+ 1,  •  •  •  ,9k  Jo) 

,9k,  9k+ 1,  •  •  •  ,9K,h) 

,9k,9k+ 1,  ■  ■  ■  ,9i<,h) 

,9k,9k+i,  ■  ■  ■  ,9K,h) 

nk- 1  :  (k  -  1 ,9i,92,  ■  ■  ■  , 9k-\, 9k, 9k+i,  •  ■  ■  ,9K,h- i) 

Y^k  ■  {k,  9l,  92i  i  9ki  9k+l,  ,  9k,  Ik) 

Figure  7.13  An  execution  in  T(. 

execution  starts).  We  formalize  the  notation  of  using  these  rules  on  T®.  Let  the  rules  operate 
on  the  relations  Hs  and  Ss.  These  relations  are  of  the  form:  H„([ki,  gi],  h,  [, k? ,  g^jh),  which 
semantically  means  that  if  the  data  state  at  ep(n)  was  ([ki,  then  the  data  state  at  n 

can  be  ([h2, 92),  h)-',  and  the  summary  relation  would  be  (S'|([&:L,  ^i],  [k2,  .92] ) •  For  a  statement 
st  in  Ti,  its  translation  in  Tjs  encodes  the  transformer: 

{((k,gi,  •  •  •  ,9k,- ■■  ,9k), l,  (k,gi,  ■  ■  ■  ,g'k ,  •••  ,9k),  l')  \  (. 9k,l,9k ,1')  e  st,  1  <  k  <  K} 

Additionally,  one  has  a  self-loop  edge  associated  with  a  transformer  that  increments  the 
value  of  k:  {([&,  g],  l,  [k  +  1,  g],  l)  \  1  <  k  <  K}.  Given  a  proof  tree  7 r  for  p,  we  build  a  proof 
tree  7r'  using  rules  of  Fig.  7.5  by  induction  on  the  bottom-most  rule  of  7 r. 

When  k  —  1  in  p,  the  conversion  is  straightforward:  just  replace  a  rule  7 Z  in  n  with 
the  primed  rule  7 Z'  from  Fig.  7.5.  An  example  is  shown  in  Fig.  7.14  for  a  program  path 


no  ■  (1, 9i,  92, 
n  1  :  i1,  g'i,  92,  ■  ■  ■ 
ni  :  (2,  g'i,  92,”  ’ 

n2  :  (2,  g'i,  g2, 


[1>  [90,  g]\  G  Gs,  lo  G  L  ^ 

#£„([!,  [ffO,S]],Jo,  [1,  [PO,P]],2o)  0  nO  m  (90,2o,91,2i)  e  [sti] 

tfnid1.  [90,5]],i0,  [1,  [SI,  ff]],  il) 


tf^di,  [po,p]],2o,  [i,  [gi,g]],h)  n i  — — >  n2 

#n3([l>  [si.sll.k,  [1,  [91,9]], *2) 


■  ?e7  st2 

ri3  — — >  ?14  (pi,  h,  92,  i3)  e  [st2] 


2^3  ([1,  [si,p]],*2,  [1,  [P2,p]],23) 


[so,p]],io,  [1,  [pi,p]],2i)  ni  — - »  n2  Sf([l,  [pi,p]],  [1,  [p2,p]]) 

#£,([!,  [90,p]],i0,  [1,  [P2,p]],il) 


Po  S  G,  lo  £  L 

^  /v-io  Stl 

Hn0  (so,  l,Po,2o,  l,Po,2o)  (po,2o,9i,2i)  e  [sti]  f 

- 1 

Hfnigo,  1,9o,2o,  1,9i,2i) 


HL  (po,  l,po,io,  l,pi,2i)  m  -caI1  f  >  n2 


3 (po,  l,pi,/2, 1,91,22) 


'  113  -~2->  n4  (pi,i2,p2,23)  S  [st2] 


HnJgo,  1,pi,22.  1,92,23) 


H}ri(go- l,po,2o,  1, 91, 2i)  m  caI1  f  >  w2  Sf(l,pi,  l,p2) 
Hn2  (go,  1,  Po,  2o,  l,P2,2l) 


Figure  7.14  An  example  of  converting  from  proof  7r  to  proof  n' .  For  brevity,  we  use  st  to 
mean  a  statement  in  the  thread  Tf  (and  not  its  translated  version  in  Tjs). 


sti  call  f 


n0  — ^  ni 


n2,  where  the  call  to  f  takes  the  path  n3  —  n4.  Let  (pi,  •  •  •  ,  gk+i)\k  — 


(tfi,  - ,0*)- 

The  induction  hypothesis  is  as  follows:  given  p,  as  shown  in  Fig.  7.13,  if  there  is  a 
proof  tree  n  that  derives  ([ki,g],  /,  [k,  (g[ ,  •  •  •  ,  g'k,  gk+i,  •  •  •  ,  <?#)],  0  then  one  can  derive 
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Hnk((gi,  •  •  •  ,  fi'fc),  k\,  g\k,  l,  k,  >  9k )>  0-  Note  that  in  this  case,  the  last  (K  —  hi)  compo¬ 

nents  of  g  must  be  (gkl+i,  •  •  •  ,  9k)  because  Tf  could  not  have  modihed  them.  We  have  already 
proved  the  base  case  above.  Fix  ginit  =  (g1,  ■■■  ,gk)  and  .9finai  =  (r/, ,  •  •  •  ,  gk,  gk+r,  •  •  •  ,  gK). 

The  bottom-most  rule  of  n  can  be  1Zi,1Z2  or  1Z7.  For  the  rule  TZi,  one  can  either  use  a 
statement  transformer,  or  increment  the  value  of  k.  All  these  cases,  and  the  way  to  obtain 
7r'  are  shown  in  Fig.  7.15. 

One  can  prove  a  similar  result  for  T|.  Note  that  H*  (ginit,  ki,  g\k,  l,  k,  <?finai|fc,  l1)  implies 
E'(k,  fjimt i  .9fina,i|fc)-  Thus,  these  results  are  sufficient  to  prove  one  side  of  the  theorem:  given 
an  execution  of  the  concurrent  program,  we  can  obtain  executions  of  Tf  and  T|,  and  then 
use  the  above  results  together  to  show  that  the  rules  in  Fig.  7.5  can  simulate  the  execution 
of  the  concurrent  program. 

Going  the  other  way  is  similar.  A  deduction  on  H1  can  be  converted  into 
an  interprocedural  path  of  T(.  The  rule  77g  corresponds  to  incrementing  the 
value  of  k,  and  must  be  used  a  bounded  number  of  times  in  a  derivation  of 
H 1  fact.  The  E 2  assumptions  used  in  a  derivation  have  to  be  of  the  form 
E2(l,g'1,g2),E2(2,(g'1,g’2),(g2,g3)),---  ,  E2(i,  (g{,  •  •  •  ,  g'i),  (g2,  ■  ■  •  ,9i+ 1))-  This  is  because  the 
second  component  of  Et 1  is  only  extended,  but  never  modihed,  and  once  k  is  incremented,  the 
first  k  components  cannot  be  modihed  either.  Now,  we  can  use  the  conversions  of  Fig.  7.15 
in  the  opposite  direction  to  prove  the  reverse  direction  of  the  theorem. 
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Figure  7.15  Simulation  of  run  p  using  rules  in  Fig.  7.5.  In  case  (a),  gk  =  check(^init|fe_1, 
(g2,  ■  ■  ■  ,  gk ))  and  g'k  =  gk  (because  p  does  not  edit  these  set  of  variables).  In  case  (d), 
exitnode(m)  holds,  f  =  proc(m),  ki  <  k2  <  k,  the  k2  +  1  to  k  components  of  g'  are 
(, gk2+u  •  •  •  ,  gk)  because  it  arises  when  k  =  k2,  and  the  k±  +  1  to  k  components  of  g  are 

,  gk)  for  the  same  reason. 
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Chapter  8 
Conclusions 

A  program-verification  technique  aims  to  gain  certain  knowledge  about  a  program’s  be¬ 
havior  to  determine  whether  some  program  execution  can  be  faulty,  or  whether  no  faulty 
executions  are  possible.  Such  techniques  are  becoming  increasingly  important  as  software 
gets  larger,  more  complex,  and  often  hard  to  reason  about  manually.  This  dissertation  gives 
several  techniques  that  can  reason  about  two  important  aspects  of  a  program:  procedures 
(and  procedure  calls)  and  concurrency. 

We  follow  the  common  design  of  program-verification  tools  in  which  verification  is  split 
into  two  phases:  an  abstraction  phase,  which  produces  an  abstract  model,  and  an  analysis 
phase,  which  precisely  reasons  about  the  abstract  model.  The  contributions  of  this  disserta¬ 
tion  are  to  give  expressive  abstract  models  that  can  easily  encode  programs  with  procedures 
and  concurrency,  and  efficient  analysis  algorithms  for  these  models.  Thus,  to  solve  a  new 
verification  problem,  one  only  needs  to  encode  the  problem  using  one  of  our  abstract  models, 
and  then  analyze  the  model  using  one  of  onr  algorithms. 

Analysis  of  Sequential  Programs 

In  Chapter  3,  we  defined  Extended  Weighted  Pushdown  Systems  (EWPDSs).  We  demon¬ 
strated  the  power  of  EWPDSs  by  showing  that  several  problems  can  be  solved  using  EW¬ 
PDSs,  including  Boolean  program  verification,  affine-relation  analysis,  and  single-level  alias 
analysis.  We  gave  efficient  algorithms  for  analyzing  EWPDSs.  One  of  the  advantages  of  us¬ 
ing  EWPDSs  is  that  it  supports  stack-qualified  queries.  In  our  previous  work  [56],  we  showed 
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the  importance  of  using  stack-qualified  queries  in  the  context  of  debugging:  the  stack  trace 
at  the  point  of  a  program  crash  is  an  important  clue  about  what  the  program  execution  did 
before  failing.  Obtaining  information  from  the  program  model  that  is  specific  to  the  stack 
trace  requires  a  stack-qualified  query. 

In  Chapter  4,  we  gave  an  algorithm,  called  FWPDS,  for  faster  analysis  of  WPDSs  and 
EWPDSs.  Because  FWPDS  applies  to  abstract  models,  it  improves  the  running  time  of 
any  application  based  on  these  abstract  models.  We  observed  1.8 x  to  3.6 x  speedups  in 
three  different  applications  that  used  EWPDSs  without  requiring  any  fine-tuning  for  an 
application.  These  applications  were:  (i)  the  debugging  application  mentioned  above,  which 
searches  for  a  particular  path  in  the  control-flow  graph  of  a  program;  (ii)  an  analysis  that 
finds  a  set  of  affine  relations  in  x86  programs;  and  (Hi)  an  assertion  checker  for  Boolean 
programs.  In  Chapter  5,  we  showed  how  to  answer  more  expressive  queries  on  EWPDSs, 
which  compute  what  we  call  error  projections.  An  error  projection  is  the  set  of  all  nodes 
that  lie  on  an  error  trace  in  the  abstract  model.  Computing  an  error  projection  can  help 
speed  up  abstraction-refinement-based  techniques. 

All  of  the  techniques  mentioned  above,  namely  EWPDSs,  FWPDSs,  and  error  projec¬ 
tions,  are  implemented  as  a  library  and  available  for  download  as  part  of  the  WALi  package 
[47].  We  have  also  addressed  the  problem  of  speeding  up  multiple  (E)WPDS  queries  [57], 
and  that  is  included  with  WALi  as  well. 

Analysis  of  Concurrent  Programs 

The  above  work  is  on  interprocedural  analysis  of  sequential  programs.  In  Chapters  6  and 
7,  we  presented  techniques  for  the  analysis  of  multi-procedure  concurrent  programs.  Because 
such  analyses  are  undecidable,  even  for  simple  abstractions,  we  explored  the  area  of  context- 
bounded  analysis  (CBA),  where  the  number  of  context-switches  between  different  threads  is 
bounded.  We  show  that  given  an  interprocedural  analysis  for  sequential  programs,  one  can 
automatically  extend  it  to  perform  CBA  of  concurrent  programs,  under  certain  conditions. 
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In  Chapter  6,  we  showed  that  when  each  thread  of  a  concurrent  program  is  modeled 
using  a  WPDS,  and  a  tensor-product  operation  exists  for  the  weights,  CBA  of  the  program 
can  be  carried  out  effectively.  The  algorithm  for  CBA  has  two  key  steps.  First,  each  thread 
is  analyzed  separately  to  build  a  weighted  transducer  that  captures  the  effect  of  executing 
the  thread  without  interruption  from  other  threads.  In  particular,  a  thread  T  is  converted 
into  a  weighted  transducer  tt  that  represents  the  following  relation: 

{(si,s2)  I  execution  of  T  starting  from  state  Si  can  lead  to  state  s2  }. 

We  also  showed  how  to  construct  such  a  transducer  for  a  WPDS.  This  provides  a  strong 
characterization  of  the  behaviors  of  a  WPDS.  Second,  the  transducers  from  each  of  the 
threads  are  composed  as  many  times  as  the  context  bound  K,  resulting  in  the  net  effect 
of  executing  the  concurrent  program  for  K  context  switches.  From  this,  one  can  find  the 
set  of  all  reachable  states  within  K  context  switches  and  verify  properties  of  the  program 
under  that  bound.  The  importance  of  this  result  is  that  one  has  to  do  little  work  to  obtain 
an  algorithm  for  CBA:  one  has  to  show  that  each  thread  can  be  (soundly)  modeled  using  a 
WPDS  (which  one  would  have  to  do  even  for  sequential  analysis)  and  that  a  tensor  operation 
exists  for  the  weights. 

A  topic  left  for  future  work  is  to  extend  these  results  to  EWPDSs  as  well.  The  difficulty 
lies  in  finding  a  tensor-like  operation  for  merge  functions. 

In  Chapter  7,  we  gave  a  practical  algorithm  for  CBA  of  concurrent  programs.  We  showed 
that  given  a  concurrent  program  P  and  a  context  bound  K,  one  can  create  a  sequential 
program  Pk  such  that  the  analysis  of  Pk  is  sufficient  for  CBA  of  P  under  the  bound  K . 
This  reduction  is  a  source-to-source  transformation,  and  requires  no  assumptions  nor  extra 
work  on  the  part  of  the  user,  except  for  the  identification  of  thread-local  data. 

We  implemented  the  technique  on  Boolean  programs  to  create  the  first  known  imple¬ 
mentation  of  CBA.  Using  this  tool,  we  conducted  a  study  on  concurrent  Linux  drivers  and 
showed  that  most  bugs  could  be  found  (1)  in  a  few  context  switches  and  (2)  much  faster 
than  previous  approaches. 
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One  interesting  aspect  of  the  reduction  from  the  concurrent  program  P  to  the  sequential 
program  Pk  is  how  the  program  execution  changes.  An  execution  of  P  is  split  into  pieces 
(one  for  each  execution  context),  and  then  rearranged  (so  that  the  pieces  for  each  thread 
are  put  together).  Extra  checks  are  put  in  place  to  ensure  that  no  spurious  behaviors  are 
introduced.  The  technique  of  allowing  actions  to  happen  in  a  different  order,  but  at  the 
same  time  using  constraints  to  ensure  that  the  semantics  is  preserved,  allowed  us  to  reduce 
the  complexity  of  CBA  and  make  it  linear  in  the  size  of  the  local  state  space.  This  technique 
may  be  useful  in  contexts  other  than  CBA  as  well. 

Follow-up  Work.  Because  our  reduction  is  a  source-to-source  transformation,  one 
can  apply  several  different  techniques,  developed  for  sequential  programs,  to  concurrent 
programs.  Our  implementation  of  CBA,  which  applies  to  Boolean  programs,  uses  a  BDD- 
based  solver.  In  a  follow-up  work  by  others,  our  reduction  was  extended  and  applied  to  C 
programs  [55].  They  used  an  SMT-based  solver  to  do  partial  verification  of  the  sequential 
program  produced  as  a  result  of  the  reduction. 

As  future  work,  it  would  be  interesting  to  study  further  extensions  of  our  reduction.  A 
key  factor  that  affects  scalability  is  the  size  of  shared  memory  because  the  shared  state  has 
to  be  recorded  at  each  context  switch.  In  languages  like  C,  the  shared  memory  cannot  be 
determined  statically.  Thus,  it  would  be  useful  to  design  a  technique  that  identifies  the 
shared  memory  on-the-fly  as  the  program  is  analyzed. 

In  follow-up  work  by  yet  another  group,  our  reduction  was  extended  to  a  “lazy”  reduction 
[92],  The  sequential  program  PK  produced  by  their  reduction  has  the  property  that  the 
analysis  of  Pk  permits  a  lazy  analysis  similar  to  one  we  presented  in  Section  7.6. 

Techniques  for  Weighted  Systems 

One  of  the  themes  of  the  work  presented  in  the  dissertation — and  one  of  the  areas  in  which 
it  makes  a  contribution  that  extends  beyond  program  verification — is  the  development  of 
techniques  and  algorithms  for  manipulating  weighted  automata  and  weighted  transducers. 
Both  WPDSs,  and  EWPDSs  are  based  on  Pushdown  Systems  (PDSs),  which  provide  a 


convenient  abstraction  for  the  program’s  runtime  stack.  Modeling  the  stack  is  important  for 
programs  with  procedures  because  it  allows  precise  reasoning  about  the  call-return  semantics 
of  procedure  calls. 

Algorithms  for  PDSs  are  based  on  automata-theoretic  techniques.  The  set  of  all  reachable 
states  of  a  PDS  can  be  captured  using  a  finite-state  machine.  This  allows  one  to  leverage 
the  vast  existing  knowledge  about  finite-state  machines  to  design  various  analyses  of  PDSs. 
However,  the  situation  changes  when  dealing  with  WPDSs  and  EWPDSs.  The  set  of  all 
reachable  states  of  such  models  can  only  be  captured  using  weighted  automata.  Thus,  one 
no  longer  has  the  same  rich  collection  of  techniques  available  as  one  has  for  unweighted 
automata. 

This  dissertation  presented  several  new  algorithms  for  weighted  automata.  For  instance, 
in  Chapter  5,  we  gave  a  method  to  intersect  two  weighted  automata  under  the  restriction 
that  one  automaton  is  a  forward-weighted  automaton  and  the  other  is  a  backward-weighted 
automaton.  The  algorithm  allowed  us  to  intersect  the  set  of  forward-reachable  states  from 
the  start  of  the  program  with  the  backward-reachable  states  from  an  error  point  in  the 
program  to  compute  an  error  projection.  In  Chapter  6,  we  further  extended  this  result  to 
show  how  to  intersect  any  two  weighted  automata,  provided  that  a  tensor-product  operation 
exists  for  weights.  The  result  was  then  generalized  to  composition  of  weighted  transducers, 
which  provided  an  algorithm  for  CBA. 

These  results  on  weighted  automata  are  general  and  form  the  building  blocks  of  some 
of  the  verification  techniques  described  in  the  dissertation.  They  may  be  useful  for  solving 
other  verification  or  program- analysis  problems.  Moreover,  these  results  are  of  interest  in 
their  own  right,  and  should  be  applicable  to  problems  outside  the  areas  of  verification  and 
program  analysis. 
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